Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception
- URL: http://arxiv.org/abs/2301.07870v1
- Date: Thu, 19 Jan 2023 03:58:48 GMT
- Title: Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception
- Authors: Bin Huang, Yangguang Li, Enze Xie, Feng Liang, Luya Wang, Mingzhu
Shen, Fenggang Liu, Tianqi Wang, Ping Luo, Jing Shao
- Abstract summary: pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving.
This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing real-time BEV perception on the on-vehicle chips.
- Score: 43.080075390854205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes
expensive Lidar sensors, making it a feasible solution for economical
autonomous driving. However, most existing BEV solutions either suffer from
modest performance or require considerable resources to execute on-vehicle
inference. This paper proposes a simple yet effective framework, termed
Fast-BEV, which is capable of performing real-time BEV perception on the
on-vehicle chips. Towards this goal, we first empirically find that the BEV
representation can be sufficiently powerful without expensive view
transformation or depth representation. Starting from M2BEV baseline, we
further introduce (1) a strong data augmentation strategy for both image and
BEV space to avoid over-fitting (2) a multi-frame feature fusion mechanism to
leverage the temporal information (3) an optimized deployment-friendly view
transformation to speed up the inference. Through experiments, we show Fast-BEV
model family achieves considerable accuracy and efficiency on edge. In
particular, our M1 model (R18@256x704) can run over 50FPS on the Tesla T4
platform, with 47.0% NDS on the nuScenes validation set. Our largest model
(R101@900x1600) establishes a new state-of-the-art 53.5% NDS on the nuScenes
validation set. The code is released at: https://github.com/Sense-GVT/Fast-BEV.
Related papers
- Robust Bird's Eye View Segmentation by Adapting DINOv2 [3.236198583140341]
We adapt a vision foundational model, DINOv2, to BEV estimation using Low Rank Adaptation (LoRA)
Our experiments show increased robustness of BEV perception under various corruptions.
We also showcase the effectiveness of the adapted representations in terms of fewer learnable parameters and faster convergence during training.
arXiv Detail & Related papers (2024-09-16T12:23:35Z) - RoadBEV: Road Surface Reconstruction in Bird's Eye View [55.0558717607946]
Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance.
Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction.
This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo.
arXiv Detail & Related papers (2024-04-09T20:24:29Z) - Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet.
BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge.
Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.
Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.
This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation [104.12419434114365]
In real-world applications, sensor corruptions and failures lead to inferior performances.
We propose a robust framework, called MetaBEV, to address extreme real-world environments.
We show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities.
arXiv Detail & Related papers (2023-04-19T16:37:17Z) - VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view
Attention for Multi-view 3D Object Detection [47.926010021559314]
transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks.
However, their performance in multi-view 3D object detection remains inferior to the state-of-the-art (SOTA) of convolutional neural network based detectors.
We propose a novel BEV feature generation method, dual-view attention, which generates attention weights from both the BEV and camera view.
arXiv Detail & Related papers (2023-04-03T15:00:36Z) - Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline [76.48192454417138]
Bird's-Eye View (BEV) representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception.
This paper proposes a framework, termed Fast-BEV, which is capable of performing faster BEV perception on the on-vehicle chips.
arXiv Detail & Related papers (2023-01-29T18:43:31Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.