Related papers: SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

URL: http://arxiv.org/abs/2307.11477v1
Date: Fri, 21 Jul 2023 10:28:19 GMT
Title: SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection
Authors: Jinqing Zhang, Yanan Zhang, Qingjie Liu, Yunhong Wang
Abstract summary: We propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features. We also propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature. Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance.
Score: 46.92706423094971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the pure camera-based Bird's-Eye-View (BEV) perception provides a feasible solution for economical autonomous driving. However, the existing BEV-based multi-view 3D detectors generally transform all image features into BEV features, without considering the problem that the large proportion of background information may submerge the object information. In this paper, we propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features and transform image features into semantic-aware BEV features. Accordingly, we propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature. In addition, we design a Multi-Scale Cross-Task (MSCT) head, which combines task-specific and cross-task information to predict depth distribution and semantic segmentation more accurately, further improving the quality of semantic-aware BEV feature. Finally, we integrate the above modules into a novel multi-view 3D object detection framework, namely SA-BEV. Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance. Code has been available at https://github.com/mengtan00/SA-BEV.git.

Related papers

DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z)
OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework. Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z)
BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks [28.024042528077125]
Bird's-Eye-View (BEV) 3D Object Detection is a crucial multi-view technique for autonomous driving systems. We propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights.
arXiv Detail & Related papers (2022-12-02T15:14:48Z)
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z)
PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View [26.264139933212892]
Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics. transforming image features into BEV necessitates special operators to conduct feature sampling. We propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
arXiv Detail & Related papers (2022-08-19T15:19:20Z)
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems. We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z)
M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation. M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z)
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers [39.253627257740085]
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. We present a new framework termed BEVFormer, which learns unified BEV representations with transformers to support multiple autonomous driving perception tasks. We show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions.
arXiv Detail & Related papers (2022-03-31T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.