NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
3D Scenes
- URL: http://arxiv.org/abs/2111.13260v2
- Date: Mon, 29 Nov 2021 15:30:33 GMT
- Title: NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
3D Scenes
- Authors: Suhani Vora and Noha Radwan and Klaus Greff and Henning Meyer and Kyle
Genova and Mehdi S. M. Sajjadi and Etienne Pot and Andrea Tagliasacchi and
Daniel Duckworth
- Abstract summary: NeSF is a method for producing 3D semantic fields from posed RGB images alone.
Our method is the first to offer truly dense 3D scene segmentations requiring only 2D supervision for training.
- Score: 25.26518805603798
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present NeSF, a method for producing 3D semantic fields from posed RGB
images alone. In place of classical 3D representations, our method builds on
recent work in implicit neural scene representations wherein 3D structure is
captured by point-wise functions. We leverage this methodology to recover 3D
density fields upon which we then train a 3D semantic segmentation model
supervised by posed 2D semantic maps. Despite being trained on 2D signals
alone, our method is able to generate 3D-consistent semantic maps from novel
camera poses and can be queried at arbitrary 3D points. Notably, NeSF is
compatible with any method producing a density field, and its accuracy improves
as the quality of the density field improves. Our empirical analysis
demonstrates comparable quality to competitive 2D and 3D semantic segmentation
baselines on complex, realistically rendered synthetic scenes. Our method is
the first to offer truly dense 3D scene segmentations requiring only 2D
supervision for training, and does not require any semantic input for inference
on novel scenes. We encourage the readers to visit the project website.
Related papers
- MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors [11.118490283303407]
We propose a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D.
Our method produces accurate semantics and geometry in both 3D and 2D space.
arXiv Detail & Related papers (2024-09-21T05:12:13Z) - Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space [10.49905491984899]
This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS.
We directly supervise the 3D points to train the language embedding field.
It achieves state-of-the-art accuracy without relying on multi-scale language embeddings.
arXiv Detail & Related papers (2024-08-14T09:50:02Z) - WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space [77.92350895927922]
We propose WildFusion, a new approach to 3D-aware image synthesis based on latent diffusion models (LDMs)
Our 3D-aware LDM is trained without any direct supervision from multiview images or 3D geometry.
This opens up promising research avenues for scalable 3D-aware image synthesis and 3D content creation from in-the-wild image data.
arXiv Detail & Related papers (2023-11-22T18:25:51Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - 3D-Aware Indoor Scene Synthesis with Depth Priors [62.82867334012399]
Existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside.
We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry.
arXiv Detail & Related papers (2022-02-17T09:54:29Z) - Semantic Correspondence via 2D-3D-2D Cycle [58.023058561837686]
We propose a new method on predicting semantic correspondences by leveraging it to 3D domain.
We show that our method gives comparative and even superior results on standard semantic benchmarks.
arXiv Detail & Related papers (2020-04-20T05:27:45Z) - Semantic Implicit Neural Scene Representations With Semi-Supervised
Training [47.61092265963234]
We show that implicit neural scene representations can be leveraged to perform per-point semantic segmentation.
Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks.
We explore two novel applications for this semantically aware implicit neural scene representation.
arXiv Detail & Related papers (2020-03-28T00:43:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.