SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera
Videos
- URL: http://arxiv.org/abs/2308.09244v2
- Date: Tue, 5 Sep 2023 11:38:10 GMT
- Title: SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera
Videos
- Authors: Haisong Liu, Yao Teng, Tao Lu, Haiguang Wang, Limin Wang
- Abstract summary: SparseBEV is a fully sparse 3D object detector that outperforms the dense counterparts.
On the test split of nuScenes, SparseBEV achieves the state-of-the-art performance of 67.5 NDS.
- Score: 20.51396212498941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn
great attention over the past few years. Dense detectors typically follow a
two-stage pipeline by first constructing a dense BEV feature and then
performing object detection in BEV space, which suffers from complex view
transformations and high computation cost. On the other side, sparse detectors
follow a query-based paradigm without explicit dense BEV feature construction,
but achieve worse performance than the dense counterparts. In this paper, we
find that the key to mitigate this performance gap is the adaptability of the
detector in both BEV and image space. To achieve this goal, we propose
SparseBEV, a fully sparse 3D object detector that outperforms the dense
counterparts. SparseBEV contains three key designs, which are (1)
scale-adaptive self attention to aggregate features with adaptive receptive
field in BEV space, (2) adaptive spatio-temporal sampling to generate sampling
locations under the guidance of queries, and (3) adaptive mixing to decode the
sampled features with dynamic weights from the queries. On the test split of
nuScenes, SparseBEV achieves the state-of-the-art performance of 67.5 NDS. On
the val split, SparseBEV achieves 55.8 NDS while maintaining a real-time
inference speed of 23.5 FPS. Code is available at
https://github.com/MCG-NJU/SparseBEV.
Related papers
- BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection [47.7933708173225]
Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection.
This paper introduces a "modernized" dense BEV framework dubbed BEVNeXt.
On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks.
arXiv Detail & Related papers (2023-12-04T07:35:02Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.
Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.
This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z) - MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation [104.12419434114365]
In real-world applications, sensor corruptions and failures lead to inferior performances.
We propose a robust framework, called MetaBEV, to address extreme real-world environments.
We show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities.
arXiv Detail & Related papers (2023-04-19T16:37:17Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry
Learning [7.6887888234987125]
We propose a learning scheme of Target Inner-Geometry from the LiDAR modality into camera-based BEV detectors.
TiG-BEV can effectively boost BEVDepth by +2.3% NDS and +2.4% mAP, along with BEVDet by +9.1% NDS and +10.3% mAP on nuScenes val set.
arXiv Detail & Related papers (2022-12-28T17:53:43Z) - PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View [26.264139933212892]
Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics.
transforming image features into BEV necessitates special operators to conduct feature sampling.
We propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
arXiv Detail & Related papers (2022-08-19T15:19:20Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.