OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving
- URL: http://arxiv.org/abs/2506.23565v1
- Date: Mon, 30 Jun 2025 07:18:17 GMT
- Title: OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving
- Authors: Mingqian Ji, Jian Yang, Shanshan Zhang,
- Abstract summary: Current multi-view 3D object detection methods typically transfer 2D features into 3D space using depth estimation or 3D position encoder.<n>Inspired by the success of radiance fields on 3D reconstruction, we assume they can be used to enhance the detector's ability of 3D geometry estimation.<n>We propose Object-centric Radiance Fields (OcRF) to enhance 3D voxel features via an auxiliary task of rendering foreground objects.
- Score: 32.07206206508925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current multi-view 3D object detection methods typically transfer 2D features into 3D space using depth estimation or 3D position encoder, but in a fully data-driven and implicit manner, which limits the detection performance. Inspired by the success of radiance fields on 3D reconstruction, we assume they can be used to enhance the detector's ability of 3D geometry estimation. However, we observe a decline in detection performance, when we directly use them for 3D rendering as an auxiliary task. From our analysis, we find the performance drop is caused by the strong responses on the background when rendering the whole scene. To address this problem, we propose object-centric radiance fields, focusing on modeling foreground objects while discarding background noises. Specifically, we employ Object-centric Radiance Fields (OcRF) to enhance 3D voxel features via an auxiliary task of rendering foreground objects. We further use opacity - the side-product of rendering- to enhance the 2D foreground BEV features via Height-aware Opacity-based Attention (HOA), where attention maps at different height levels are generated separately via multiple networks in parallel. Extensive experiments on the nuScenes validation and test datasets demonstrate that our OcRFDet achieves superior performance, outperforming previous state-of-the-art methods with 57.2$\%$ mAP and 64.8$\%$ NDS on the nuScenes test benchmark. Code will be available at https://github.com/Mingqj/OcRFDet.
Related papers
- 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection [58.78881632019072]
We introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD)<n>We lift the open-set 2D detection into 3D space through our designed 3D bounding box head.<n>We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes.
arXiv Detail & Related papers (2025-07-31T13:56:41Z) - What Matters in Range View 3D Object Detection [15.147558647138629]
Lidar-based perception pipelines rely on 3D object detection models to interpret complex scenes.
We achieve state-of-the-art amongst range-view 3D object detection models without using multiple techniques proposed in past range-view literature.
arXiv Detail & Related papers (2024-07-23T18:42:37Z) - MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection [31.58403386994297]
We propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy.
Specifically, we model scenes with Signed Distance Functions (SDF), facilitating the production of dense 3D representations.
To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception.
arXiv Detail & Related papers (2023-08-18T09:39:52Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Surface-biased Multi-Level Context 3D Object Detection [1.9723551683930771]
This work addresses the object detection task in 3D point clouds using a highly efficient, surface-biased, feature extraction method (wang2022rbgnet)
We propose a 3D object detector that extracts accurate feature representations of object candidates and leverages self-attention on point patches, object candidates, and on the global scene in 3D scene.
arXiv Detail & Related papers (2023-02-13T11:50:04Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object
Detection [17.526914782562528]
We propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning (GSL)
Our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
arXiv Detail & Related papers (2022-04-25T12:10:34Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Expandable YOLO: 3D Object Detection from RGB-D Images [64.14512458954344]
This paper aims at constructing a light-weight object detector that inputs a depth and a color image from a stereo camera.
By extending the network architecture of YOLOv3 to 3D in the middle, it is possible to output in the depth direction.
Intersection over Uninon (IoU) in 3D space is introduced to confirm the accuracy of region extraction results.
arXiv Detail & Related papers (2020-06-26T07:32:30Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.