Related papers: BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

URL: http://arxiv.org/abs/2211.10439v1
Date: Fri, 18 Nov 2022 18:59:48 GMT
Title: BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
Authors: Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai
Abstract summary: We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.
Score: 101.36648828734646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. Existing state-of-the-art BEV detectors are often tied to certain depth pre-trained backbones like VoVNet, hindering the synergy between booming image backbones and BEV detectors. To address this limitation, we prioritize easing the optimization of BEV detectors by introducing perspective space supervision. To this end, we propose a two-stage BEV detector, where proposals from the perspective head are fed into the bird's-eye-view head for final predictions. To evaluate the effectiveness of our model, we conduct extensive ablation studies focusing on the form of supervision and the generality of the proposed detector. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset. The code shall be released soon.

Related papers

BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning [39.8617381331589]
We present BEVCon, a contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving.<n>BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-06T17:59:37Z)
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection. RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation [4.9185678564997355]
Birds--EyeView (BEV) segmentation aims to establish a spatial mapping from the perspective view to the top view. Recent studies have encountered difficulties in view transformation due to the disruption of BEV-agnostic features in image space.
arXiv Detail & Related papers (2024-10-21T12:00:52Z)
Robust Bird's Eye View Segmentation by Adapting DINOv2 [3.236198583140341]
We adapt a vision foundational model, DINOv2, to BEV estimation using Low Rank Adaptation (LoRA) Our experiments show increased robustness of BEV perception under various corruptions. We also showcase the effectiveness of the adapted representations in terms of fewer learnable parameters and faster convergence during training.
arXiv Detail & Related papers (2024-09-16T12:23:35Z)
DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z)
Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field. We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z)
FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation. We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z)
VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection [47.926010021559314]
transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks. However, their performance in multi-view 3D object detection remains inferior to the state-of-the-art (SOTA) of convolutional neural network based detectors. We propose a novel BEV feature generation method, dual-view attention, which generates attention weights from both the BEV and camera view.
arXiv Detail & Related papers (2023-04-03T15:00:36Z)
SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images [26.34702432184092]
We propose the first self-supervised approach for generating a Bird's-Eye-View (BEV) semantic map using a single monocular image from the frontal view (FV) In training, we overcome the need for BEV ground truth annotations by leveraging the more easily available FV semantic annotations of video sequences. Our approach performs on par with the state-of-the-art fully supervised methods and achieves competitive results using only 1% of direct supervision in the BEV.
arXiv Detail & Related papers (2023-02-08T18:02:09Z)
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z)
PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View [26.264139933212892]
Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics. transforming image features into BEV necessitates special operators to conduct feature sampling. We propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
arXiv Detail & Related papers (2022-08-19T15:19:20Z)
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving. We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.