Related papers: BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy

BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy

URL: http://arxiv.org/abs/2305.16829v2
Date: Thu, 11 Jan 2024 03:13:31 GMT
Title: BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy
Authors: Zaibin Zhang, Yuanhang Zhang, Lijun Wang, Yifan Wang, Huchuan Lu
Abstract summary: We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information. We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
Score: 58.92659367605442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A popular approach for constructing bird's-eye-view (BEV) representation in 3D detection is to lift 2D image features onto the viewing frustum space based on explicitly predicted depth distribution. However, depth distribution can only characterize the 3D geometry of visible object surfaces but fails to capture their internal space and overall geometric structure, leading to sparse and unsatisfactory 3D representations. To mitigate this issue, we present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information. At the core of our method is the newly-designed instance occupancy prediction (IOP) module, which aims to infer point-level occupancy status for each instance in the frustum space. To ensure training efficiency while maintaining representational flexibility, it is trained using the combination of both explicit and implicit supervision. With the predicted occupancy, we further design a geometry-aware feature propagation mechanism (GFP), which performs self-attention based on occupancy distribution along each ray in frustum and is able to enforce instance-level feature consistency. By integrating the IOP module with GFP mechanism, our BEV-IO detector is able to render highly informative 3D scene structures with more comprehensive BEV representations. Experimental results demonstrate that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters (0.2%) and computational overhead (0.24%in GFLOPs).

Related papers

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection. RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
Lightweight Spatial Embedding for Vision-based 3D Occupancy Prediction [37.8001844396061]
LightOcc is an innovative 3D occupancy prediction framework that leverages Lightweight Spatial Embedding. LightOcc significantly increases the prediction accuracy of the baseline and achieves state-of-the-art performance on the Occ3D-nuScenes benchmark.
arXiv Detail & Related papers (2024-12-08T15:49:35Z)
LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation [10.434754671492723]
We propose LSSInst, a two-stage object detector incorporating BEV and instance representations in tandem. The proposed detector exploits fine-grained pixel-level features that can be flexibly integrated into existing LSS-based BEV networks. Our proposed framework is of excellent generalization ability and performance, which boosts the performances of modern LSS-based BEV perception methods without bells and whistles.
arXiv Detail & Related papers (2024-11-09T13:03:54Z)
GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection [36.245654685143016]
Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection. Existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state.
arXiv Detail & Related papers (2024-09-03T11:57:36Z)
Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field. We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z)
LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation [38.38852904444365]
This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D. Our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2023-04-04T04:05:56Z)
BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap [10.060577111347152]
We propose a novel LiDAR-based 3D object detection model named BSH-Det3D. It applies an effective way to enhance spatial features by estimating complete shapes from a bird's eye view. Experiments on the KITTI benchmark achieve state-of-the-art (SOTA) performance in terms of accuracy and speed.
arXiv Detail & Related papers (2023-03-03T15:13:11Z)
OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework. Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z)
M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation. M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z)
Improving Point Cloud Semantic Segmentation by Learning 3D Object Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving. Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes. We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.