NeRF-DetS: Enhanced Adaptive Spatial-wise Sampling and View-wise Fusion Strategies for NeRF-based Indoor Multi-view 3D Object Detection
- URL: http://arxiv.org/abs/2404.13921v2
- Date: Mon, 30 Dec 2024 13:26:37 GMT
- Title: NeRF-DetS: Enhanced Adaptive Spatial-wise Sampling and View-wise Fusion Strategies for NeRF-based Indoor Multi-view 3D Object Detection
- Authors: Chi Huang, Xinyang Li, Yansong Qu, Changli Wu, Xiaofan Li, Shengchuan Zhang, Liujuan Cao,
- Abstract summary: In indoor scenes, the diverse distribution of object locations and scales makes the visual 3D perception task a big challenge.<n>Previous works have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task.<n>We propose a simple yet effective method, NeRF-DetS, to address these issues.
- Score: 17.631688089207724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In indoor scenes, the diverse distribution of object locations and scales makes the visual 3D perception task a big challenge. Previous works (e.g, NeRF-Det) have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task in indoor scenes with high amount of overlap between input images. However, previous works cannot fully utilize the advancement of implicit representation because of fixed sampling and simple multi-view feature fusion. In this paper, inspired by sparse fashion method (e.g, DETR3D), we propose a simple yet effective method, NeRF-DetS, to address above issues. NeRF-DetS includes two modules: Progressive Adaptive Sampling Strategy (PASS) and Depth-Guided Simplified Multi-Head Attention Fusion (DS-MHA). Specifically, (1)PASS can automatically sample features of each layer within a dense 3D detector, using offsets predicted by the previous layer. (2)DS-MHA can not only efficiently fuse multi-view features with strong occlusion awareness but also reduce computational cost. Extensive experiments on ScanNetV2 dataset demonstrate our NeRF-DetS outperforms NeRF-Det, by achieving +5.02% and +5.92% improvement in mAP under IoU25 and IoU50, respectively. Also, NeRF-DetS shows consistent improvements on ARKITScenes.
Related papers
- Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection [40.14197775884804]
MonoASRH is a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH)
EH-FAM employs multi-head attention with a global receptive field to extract semantic features for small-scale objects.
ASRH encodes 2D bounding box dimensions and then fuses scale features with the semantic features aggregated by EH-FAM.
arXiv Detail & Related papers (2024-11-05T02:33:25Z) - Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks [4.499833362998488]
Implicit neural representations (INRs) use neural networks to provide continuous and resolution-independent representations of complex signals.
The proposed FKAN utilizes learnable activation functions modeled as Fourier series in the first layer to effectively control and learn the task-specific frequency components.
Experimental results show that our proposed FKAN model outperforms three state-of-the-art baseline schemes.
arXiv Detail & Related papers (2024-09-14T05:53:33Z) - FSMDet: Vision-guided feature diffusion for fully sparse 3D detector [0.8437187555622164]
We propose FSMDet (Fully Sparse Multi-modal Detection), which use visual information to guide the LiDAR feature diffusion process.
Our method can be up to 5 times more efficient than previous SOTA methods in the inference process.
arXiv Detail & Related papers (2024-09-11T01:55:45Z) - NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - NeRF-VPT: Learning Novel View Representations with Neural Radiance
Fields via View Prompt Tuning [63.39461847093663]
We propose NeRF-VPT, an innovative method for novel view synthesis to address these challenges.
Our proposed NeRF-VPT employs a cascading view prompt tuning paradigm, wherein RGB information gained from preceding rendering outcomes serves as instructive visual prompts for subsequent rendering stages.
NeRF-VPT only requires sampling RGB data from previous stage renderings as priors at each training stage, without relying on extra guidance or complex techniques.
arXiv Detail & Related papers (2024-03-02T22:08:10Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields [54.482261428543985]
Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis.
3D Gaussian splatting has shown state-of-the-art performance on real-time radiance field rendering.
We propose architectural and training changes to efficiently avert this problem.
arXiv Detail & Related papers (2023-12-06T00:46:30Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations [25.88881764546414]
VQ-NeRF is an efficient pipeline for enhancing implicit neural representations via vector quantization.
We present an innovative multi-scale NeRF sampling scheme that concurrently optimize the NeRF model at both compressed and original scales.
We incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions.
arXiv Detail & Related papers (2023-10-23T01:41:38Z) - NeRF-Det: Learning Geometry-Aware Volumetric Representation for
Multi-View 3D Object Detection [65.02633277884911]
We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input.
Our method makes use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance.
arXiv Detail & Related papers (2023-07-27T04:36:16Z) - FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation
Models [21.523836478458524]
Recent works on generalizable NeRFs have shown promising results on novel view synthesis from single or few images.
We propose a novel framework named FeatureNeRF to learn generalizable NeRFs by distilling pre-trained vision models.
Our experiments demonstrate the effectiveness of FeatureNeRF as a generalizable 3D semantic feature extractor.
arXiv Detail & Related papers (2023-03-22T17:57:01Z) - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from
3D-aware Diffusion [107.67277084886929]
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
We propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time.
We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views.
arXiv Detail & Related papers (2023-02-20T17:12:00Z) - AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware
Training [100.33713282611448]
We conduct the first pilot study on training NeRF with high-resolution data.
We propose the corresponding solutions, including marrying the multilayer perceptron with convolutional layers.
Our approach is nearly free without introducing obvious training/testing costs.
arXiv Detail & Related papers (2022-11-17T17:22:28Z) - NeRF in detail: Learning to sample for view synthesis [104.75126790300735]
Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis.
In this work we address a clear limitation of the vanilla coarse-to-fine approach -- that it is based on a performance and not trained end-to-end for the task at hand.
We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture.
arXiv Detail & Related papers (2021-06-09T17:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.