NeRF-Det: Learning Geometry-Aware Volumetric Representation for
Multi-View 3D Object Detection
- URL: http://arxiv.org/abs/2307.14620v1
- Date: Thu, 27 Jul 2023 04:36:16 GMT
- Title: NeRF-Det: Learning Geometry-Aware Volumetric Representation for
Multi-View 3D Object Detection
- Authors: Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, Ruilong Li, Jialiang Wang,
Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka
- Abstract summary: We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input.
Our method makes use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance.
- Score: 65.02633277884911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present NeRF-Det, a novel method for indoor 3D detection with posed RGB
images as input. Unlike existing indoor 3D detection methods that struggle to
model scene geometry, our method makes novel use of NeRF in an end-to-end
manner to explicitly estimate 3D geometry, thereby improving 3D detection
performance. Specifically, to avoid the significant extra latency associated
with per-scene optimization of NeRF, we introduce sufficient geometry priors to
enhance the generalizability of NeRF-MLP. Furthermore, we subtly connect the
detection and NeRF branches through a shared MLP, enabling an efficient
adaptation of NeRF to detection and yielding geometry-aware volumetric
representations for 3D detection. Our method outperforms state-of-the-arts by
3.9 mAP and 3.1 mAP on the ScanNet and ARKITScenes benchmarks, respectively. We
provide extensive analysis to shed light on how NeRF-Det works. As a result of
our joint-training design, NeRF-Det is able to generalize well to unseen scenes
for object detection, view synthesis, and depth estimation tasks without
requiring per-scene optimization. Code is available at
\url{https://github.com/facebookresearch/NeRF-Det}.
Related papers
- MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps [51.44887282336391]
Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection.
Previous method relies on NeRF for geometry reasoning.
We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
arXiv Detail & Related papers (2024-10-28T21:58:41Z) - PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency [33.68948881727943]
PruNeRF is a segment-centric dataset pruning framework via 3D spatial consistency.
Our experiments on benchmark datasets demonstrate that PruNeRF consistently outperforms state-of-the-art methods in robustness against distractors.
arXiv Detail & Related papers (2024-06-02T16:49:05Z) - NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation [60.47114985993196]
NeRF-Det unifies the tasks of novel view arithmetic and 3D perception.
We introduce a novel 3D perception network structure, NeRF-DetS.
NeRF-DetS outperforms competitive NeRF-Det on the ScanNetV2 dataset.
arXiv Detail & Related papers (2024-04-22T06:59:03Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection [31.58403386994297]
We propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy.
Specifically, we model scenes with Signed Distance Functions (SDF), facilitating the production of dense 3D representations.
To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception.
arXiv Detail & Related papers (2023-08-18T09:39:52Z) - Improving Neural Radiance Fields with Depth-aware Optimization for Novel
View Synthesis [12.3338393483795]
We propose SfMNeRF, a method to better synthesize novel views as well as reconstruct the 3D-scene geometry.
SfMNeRF employs the epipolar, photometric consistency, depth smoothness, and position-of-matches constraints to explicitly reconstruct the 3D-scene structure.
Experiments on two public datasets demonstrate that SfMNeRF surpasses state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-11T13:37:17Z) - EGFN: Efficient Geometry Feature Network for Fast Stereo 3D Object
Detection [51.52496693690059]
Fast stereo based 3D object detectors lag far behind high-precision oriented methods in accuracy.
We argue that the main reason is the missing or poor 3D geometry feature representation in fast stereo based methods.
The proposed EGFN outperforms YOLOStsereo3D, the advanced fast method, by 5.16% on mAP$_3d$ at the cost of merely additional 12 ms.
arXiv Detail & Related papers (2021-11-28T05:25:36Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.