OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection
- URL: http://arxiv.org/abs/2301.05711v1
- Date: Fri, 13 Jan 2023 06:02:31 GMT
- Title: OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection
- Authors: Xiaomeng Chu, Jiajun Deng, Yuan Zhao, Jianmin Ji, Yu Zhang, Houqiang
Li, Yanyong Zhang
- Abstract summary: OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
- Score: 78.38062015443195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent trend for multi-camera 3D object detection is through the unified
bird's-eye view (BEV) representation. However, directly transforming features
extracted from the image-plane view to BEV inevitably results in feature
distortion, especially around the objects of interest, making the objects blur
into the background. To this end, we propose OA-BEV, a network that can be
plugged into the BEV-based 3D object detection framework to bring out the
objects by incorporating object-aware pseudo-3D features and depth features.
Such features contain information about the object's position and 3D
structures. First, we explicitly guide the network to learn the depth
distribution by object-level supervision from each 3D object's center. Then, we
select the foreground pixels by a 2D object detector and project them into 3D
space for pseudo-voxel feature encoding. Finally, the object-aware depth
features and pseudo-voxel features are incorporated into the BEV representation
with a deformable attention mechanism. We conduct extensive experiments on the
nuScenes dataset to validate the merits of our proposed OA-BEV. Our method
achieves consistent improvements over the BEV-based baselines in terms of both
average precision and nuScenes detection score. Our codes will be published.
Related papers
- ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object [14.219472370221029]
We propose 2D Region-oriented Attention for a BEV-based 3D Object Detection Network (ROA-BEV)
Our method increases the information content of ROA through a multi-scale structure.
Experiments on nuScenes show that ROA-BEV improves the performance based on BEVDet and BEVDepth.
arXiv Detail & Related papers (2024-10-14T08:51:56Z) - GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection [36.245654685143016]
Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection.
Existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state.
arXiv Detail & Related papers (2024-09-03T11:57:36Z) - OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - Perspective-aware Convolution for Monocular 3D Object Detection [2.33877878310217]
We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images.
By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture.
We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
arXiv Detail & Related papers (2023-08-24T17:25:36Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - A Versatile Multi-View Framework for LiDAR-based 3D Object Detection
with Guidance from Panoptic Segmentation [9.513467995188634]
3D object detection using LiDAR data is an indispensable component for autonomous driving systems.
We propose a novel multi-task framework that jointly performs 3D object detection and panoptic segmentation.
arXiv Detail & Related papers (2022-03-04T04:57:05Z) - BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View [117.44028458220427]
On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices.
We present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images.
arXiv Detail & Related papers (2020-03-09T15:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.