SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view
3D Object Detection
- URL: http://arxiv.org/abs/2307.11477v1
- Date: Fri, 21 Jul 2023 10:28:19 GMT
- Title: SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view
3D Object Detection
- Authors: Jinqing Zhang, Yanan Zhang, Qingjie Liu, Yunhong Wang
- Abstract summary: We propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features.
We also propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature.
Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance.
- Score: 46.92706423094971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the pure camera-based Bird's-Eye-View (BEV) perception provides a
feasible solution for economical autonomous driving. However, the existing
BEV-based multi-view 3D detectors generally transform all image features into
BEV features, without considering the problem that the large proportion of
background information may submerge the object information. In this paper, we
propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out
background information according to the semantic segmentation of image features
and transform image features into semantic-aware BEV features. Accordingly, we
propose BEV-Paste, an effective data augmentation strategy that closely matches
with semantic-aware BEV feature. In addition, we design a Multi-Scale
Cross-Task (MSCT) head, which combines task-specific and cross-task information
to predict depth distribution and semantic segmentation more accurately,
further improving the quality of semantic-aware BEV feature. Finally, we
integrate the above modules into a novel multi-view 3D object detection
framework, namely SA-BEV. Experiments on nuScenes show that SA-BEV achieves
state-of-the-art performance. Code has been available at
https://github.com/mengtan00/SA-BEV.git.
Related papers
- DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks [28.024042528077125]
Bird's-Eye-View (BEV) 3D Object Detection is a crucial multi-view technique for autonomous driving systems.
We propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights.
arXiv Detail & Related papers (2022-12-02T15:14:48Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View [26.264139933212892]
Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics.
transforming image features into BEV necessitates special operators to conduct feature sampling.
We propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
arXiv Detail & Related papers (2022-08-19T15:19:20Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera
Images via Spatiotemporal Transformers [39.253627257740085]
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems.
We present a new framework termed BEVFormer, which learns unified BEV representations with transformers to support multiple autonomous driving perception tasks.
We show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions.
arXiv Detail & Related papers (2022-03-31T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.