Open-NeRF: Towards Open Vocabulary NeRF Decomposition
- URL: http://arxiv.org/abs/2310.16383v1
- Date: Wed, 25 Oct 2023 05:43:14 GMT
- Title: Open-NeRF: Towards Open Vocabulary NeRF Decomposition
- Authors: Hao Zhang, Fang Li, and Narendra Ahuja
- Abstract summary: We present Open-vocabulary Embedded Neural Radiance Fields (Open-NeRF)
Open-NeRF leverage large-scale, off-the-shelf, segmentation models like the Segment Anything Model (SAM)
Our experimental results show that the proposed Open-NeRF outperforms state-of-the-art methods such as LERF citelerf and FFD citeffd in open-vocabulary scenarios.
- Score: 14.759265492381509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the challenge of decomposing Neural Radiance Fields
(NeRF) into objects from an open vocabulary, a critical task for object
manipulation in 3D reconstruction and view synthesis. Current techniques for
NeRF decomposition involve a trade-off between the flexibility of processing
open-vocabulary queries and the accuracy of 3D segmentation. We present,
Open-vocabulary Embedded Neural Radiance Fields (Open-NeRF), that leverage
large-scale, off-the-shelf, segmentation models like the Segment Anything Model
(SAM) and introduce an integrate-and-distill paradigm with hierarchical
embeddings to achieve both the flexibility of open-vocabulary querying and 3D
segmentation accuracy. Open-NeRF first utilizes large-scale foundation models
to generate hierarchical 2D mask proposals from varying viewpoints. These
proposals are then aligned via tracking approaches and integrated within the 3D
space and subsequently distilled into the 3D field. This process ensures
consistent recognition and granularity of objects from different viewpoints,
even in challenging scenarios involving occlusion and indistinct features. Our
experimental results show that the proposed Open-NeRF outperforms
state-of-the-art methods such as LERF \cite{lerf} and FFD \cite{ffd} in
open-vocabulary scenarios. Open-NeRF offers a promising solution to NeRF
decomposition, guided by open-vocabulary queries, enabling novel applications
in robotics and vision-language interaction in open-world 3D scenes.
Related papers
- OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views [90.71215823587875]
We propose OpenNeRF which naturally operates on posed images and directly encodes the VLM features within the NeRF.
Our work shows that using pixel-wise VLM features results in an overall less complex architecture without the need for additional DINO regularization.
For 3D point cloud segmentation on the Replica dataset, OpenNeRF outperforms recent open-vocabulary methods such as LERF and OpenScene by at least +4.9 mIoU.
arXiv Detail & Related papers (2024-04-04T17:59:08Z) - GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields [50.68719394443926]
Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF) is a novel approach offering a generalizable implicit representation of 3D scenes with open-vocabulary semantics.
GOV-NeSF exhibits state-of-the-art performance in both 2D and 3D open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2024-04-01T05:19:50Z) - 3D Visibility-aware Generalizable Neural Radiance Fields for Interacting
Hands [51.305421495638434]
Neural radiance fields (NeRFs) are promising 3D representations for scenes, objects, and humans.
This paper proposes a generalizable visibility-aware NeRF framework for interacting hands.
Experiments on the Interhand2.6M dataset demonstrate that our proposed VA-NeRF outperforms conventional NeRFs significantly.
arXiv Detail & Related papers (2024-01-02T00:42:06Z) - Weakly Supervised 3D Open-vocabulary Segmentation [104.07740741126119]
We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner.
We distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF)
A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process.
arXiv Detail & Related papers (2023-05-23T14:16:49Z) - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from
3D-aware Diffusion [107.67277084886929]
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
We propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time.
We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views.
arXiv Detail & Related papers (2023-02-20T17:12:00Z) - SegNeRF: 3D Part Segmentation with Neural Radiance Fields [63.12841224024818]
SegNeRF is a neural field representation that integrates a semantic field along with the usual radiance field.
SegNeRF is capable of simultaneously predicting geometry, appearance, and semantic information from posed images, even for unseen objects.
SegNeRF is able to generate an explicit 3D model from a single image of an object taken in the wild, with its corresponding part segmentation.
arXiv Detail & Related papers (2022-11-21T07:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.