Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline
- URL: http://arxiv.org/abs/2301.12511v2
- Date: Tue, 9 Jul 2024 17:07:14 GMT
- Title: Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline
- Authors: Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao,
- Abstract summary: Bird's-Eye View (BEV) representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception.
This paper proposes a framework, termed Fast-BEV, which is capable of performing faster BEV perception on the on-vehicle chips.
- Score: 76.48192454417138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV , which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation nor depth representation. Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model. Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips. The code is released at: https://github.com/Sense-GVT/Fast-BEV.
Related papers
- Robust Bird's Eye View Segmentation by Adapting DINOv2 [3.236198583140341]
We adapt a vision foundational model, DINOv2, to BEV estimation using Low Rank Adaptation (LoRA)
Our experiments show increased robustness of BEV perception under various corruptions.
We also showcase the effectiveness of the adapted representations in terms of fewer learnable parameters and faster convergence during training.
arXiv Detail & Related papers (2024-09-16T12:23:35Z) - Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet.
BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge.
Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.
Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.
This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D
Object Detection [57.019527599167255]
Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved significant improvements.
We show in our paper that directly applying quantization in BEV tasks will 1) make the training unstable, and 2) lead to intolerable performance degradation.
Our method QD-BEV enables a novel view-guided distillation (VGD) objective, which can stabilize the quantization-aware training (QAT) while enhancing the model performance.
arXiv Detail & Related papers (2023-08-21T07:06:49Z) - MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation [104.12419434114365]
In real-world applications, sensor corruptions and failures lead to inferior performances.
We propose a robust framework, called MetaBEV, to address extreme real-world environments.
We show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities.
arXiv Detail & Related papers (2023-04-19T16:37:17Z) - Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception [43.080075390854205]
pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving.
This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing real-time BEV perception on the on-vehicle chips.
arXiv Detail & Related papers (2023-01-19T03:58:48Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - BEVDet: High-performance Multi-camera 3D Object Detection in
Bird-Eye-View [15.560366079077449]
We contribute the BEVDet paradigm for pushing the performance boundary in 2D object detection task.
BeVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where route planning can be handily performed.
The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between computing budget and performance.
arXiv Detail & Related papers (2021-12-22T10:48:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.