GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
- URL: http://arxiv.org/abs/2409.01816v2
- Date: Sun, 22 Dec 2024 14:15:10 GMT
- Title: GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
- Authors: Jinqing Zhang, Yanan Zhang, Yunlong Qi, Zehua Fu, Qingjie Liu, Yunhong Wang,
- Abstract summary: Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection.
Existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state.
We propose Radial-Cartesian BEV Sampling (RC-Sampling) to generate high-resolution dense BEV representation.
In conjunction with the In-Box Label, Centroid-Aware Inner Loss (CAI Loss) is developed to capture the inner geometric structure of objects.
- Score: 36.245654685143016
- License:
- Abstract: Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection, demonstrating impressive perceptual capabilities. However, existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state and failing to restore the authentic geometric information of the scene. In this paper, we identify the drawbacks of previous approaches that limit the geometric quality of BEV representation and propose Radial-Cartesian BEV Sampling (RC-Sampling), which outperforms other feature transformation methods in efficiently generating high-resolution dense BEV representation to restore fine-grained geometric information. Additionally, we design a novel In-Box Label to substitute the traditional depth label generated from the LiDAR points. This label reflects the actual geometric structure of objects rather than just their surfaces, injecting real-world geometric information into the BEV representation. In conjunction with the In-Box Label, Centroid-Aware Inner Loss (CAI Loss) is developed to capture the inner geometric structure of objects. Finally, we integrate the aforementioned modules into a novel multi-view 3D object detector, dubbed GeoBEV, which achieves a state-of-the-art result of 66.2\% NDS on the nuScenes test set. The code is available at https://github.com/mengtan00/GeoBEV.git.
Related papers
- Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation [38.38852904444365]
This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D.
Our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2023-04-04T04:05:56Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - Rethinking Dimensionality Reduction in Grid-based 3D Object Detection [24.249147412551768]
We propose a novel point cloud detection network based on a Multi-level feature dimensionality reduction strategy, called MDRNet.
In MDRNet, the Spatial-aware Dimensionality Reduction (SDR) is designed to dynamically focus on the valuable parts of the object during voxel-to-BEV feature transformation.
Experiments on nuScenes show that the proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-09-20T04:51:54Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - Learnable Triangulation for Deep Learning-based 3D Reconstruction of
Objects of Arbitrary Topology from Single RGB Images [12.693545159861857]
We propose a novel deep reinforcement learning-based approach for 3D object reconstruction from monocular images.
The proposed method outperforms the state-of-the-art in terms of visual quality, reconstruction accuracy, and computational time.
arXiv Detail & Related papers (2021-09-24T09:44:22Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.