BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy
- URL: http://arxiv.org/abs/2305.16829v2
- Date: Thu, 11 Jan 2024 03:13:31 GMT
- Title: BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy
- Authors: Zaibin Zhang, Yuanhang Zhang, Lijun Wang, Yifan Wang, Huchuan Lu
- Abstract summary: We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
- Score: 58.92659367605442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A popular approach for constructing bird's-eye-view (BEV) representation in
3D detection is to lift 2D image features onto the viewing frustum space based
on explicitly predicted depth distribution. However, depth distribution can
only characterize the 3D geometry of visible object surfaces but fails to
capture their internal space and overall geometric structure, leading to sparse
and unsatisfactory 3D representations. To mitigate this issue, we present
BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance
occupancy information. At the core of our method is the newly-designed instance
occupancy prediction (IOP) module, which aims to infer point-level occupancy
status for each instance in the frustum space. To ensure training efficiency
while maintaining representational flexibility, it is trained using the
combination of both explicit and implicit supervision. With the predicted
occupancy, we further design a geometry-aware feature propagation mechanism
(GFP), which performs self-attention based on occupancy distribution along each
ray in frustum and is able to enforce instance-level feature consistency. By
integrating the IOP module with GFP mechanism, our BEV-IO detector is able to
render highly informative 3D scene structures with more comprehensive BEV
representations. Experimental results demonstrate that BEV-IO can outperform
state-of-the-art methods while only adding a negligible increase in parameters
(0.2%) and computational overhead (0.24%in GFLOPs).
Related papers
- LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation [10.434754671492723]
We propose LSSInst, a two-stage object detector incorporating BEV and instance representations in tandem.
The proposed detector exploits fine-grained pixel-level features that can be flexibly integrated into existing LSS-based BEV networks.
Our proposed framework is of excellent generalization ability and performance, which boosts the performances of modern LSS-based BEV perception methods without bells and whistles.
arXiv Detail & Related papers (2024-11-09T13:03:54Z) - GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection [36.245654685143016]
Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection.
Existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state.
arXiv Detail & Related papers (2024-09-03T11:57:36Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation [38.38852904444365]
This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D.
Our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2023-04-04T04:05:56Z) - BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap [10.060577111347152]
We propose a novel LiDAR-based 3D object detection model named BSH-Det3D.
It applies an effective way to enhance spatial features by estimating complete shapes from a bird's eye view.
Experiments on the KITTI benchmark achieve state-of-the-art (SOTA) performance in terms of accuracy and speed.
arXiv Detail & Related papers (2023-03-03T15:13:11Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.