GNeSF: Generalizable Neural Semantic Fields
- URL: http://arxiv.org/abs/2310.15712v2
- Date: Thu, 26 Oct 2023 06:40:23 GMT
- Title: GNeSF: Generalizable Neural Semantic Fields
- Authors: Hanlin Chen, Chen Li, Mengqi Guo, Zhiwen Yan, Gim Hee Lee
- Abstract summary: We introduce a generalizable 3D segmentation framework based on implicit representation.
We propose a novel soft voting mechanism to aggregate the 2D semantic information from different views for each 3D point.
Our approach can even outperform existing strong supervision-based approaches with only 2D annotations.
- Score: 48.49860868061573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D scene segmentation based on neural implicit representation has emerged
recently with the advantage of training only on 2D supervision. However,
existing approaches still requires expensive per-scene optimization that
prohibits generalization to novel scenes during inference. To circumvent this
problem, we introduce a generalizable 3D segmentation framework based on
implicit representation. Specifically, our framework takes in multi-view image
features and semantic maps as the inputs instead of only spatial information to
avoid overfitting to scene-specific geometric and semantic information. We
propose a novel soft voting mechanism to aggregate the 2D semantic information
from different views for each 3D point. In addition to the image features, view
difference information is also encoded in our framework to predict the voting
scores. Intuitively, this allows the semantic information from nearby views to
contribute more compared to distant ones. Furthermore, a visibility module is
also designed to detect and filter out detrimental information from occluded
views. Due to the generalizability of our proposed method, we can synthesize
semantic maps or conduct 3D semantic segmentation for novel scenes with solely
2D semantic supervision. Experimental results show that our approach achieves
comparable performance with scene-specific approaches. More importantly, our
approach can even outperform existing strong supervision-based approaches with
only 2D annotations. Our source code is available at:
https://github.com/HLinChen/GNeSF.
Related papers
- Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning [119.99066522299309]
KYN is a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density.
We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation.
We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work.
arXiv Detail & Related papers (2024-04-04T17:59:59Z) - GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields [50.68719394443926]
Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF) is a novel approach offering a generalizable implicit representation of 3D scenes with open-vocabulary semantics.
GOV-NeSF exhibits state-of-the-art performance in both 2D and 3D open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2024-04-01T05:19:50Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Learning 3D Semantics from Pose-Noisy 2D Images with Hierarchical Full
Attention Network [17.58032517457836]
We propose a novel framework to learn 3D point cloud semantics from 2D multi-view image observations containing pose error.
A hierarchical full attention network(HiFANet) is designed to sequentially aggregates patch, bag-of-frames and inter-point semantic cues.
Experiment results show that the proposed framework outperforms existing 3D point cloud based methods significantly.
arXiv Detail & Related papers (2022-04-17T20:24:26Z) - NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
3D Scenes [25.26518805603798]
NeSF is a method for producing 3D semantic fields from posed RGB images alone.
Our method is the first to offer truly dense 3D scene segmentations requiring only 2D supervision for training.
arXiv Detail & Related papers (2021-11-25T21:44:54Z) - Semantic Correspondence via 2D-3D-2D Cycle [58.023058561837686]
We propose a new method on predicting semantic correspondences by leveraging it to 3D domain.
We show that our method gives comparative and even superior results on standard semantic benchmarks.
arXiv Detail & Related papers (2020-04-20T05:27:45Z) - Semantic Implicit Neural Scene Representations With Semi-Supervised
Training [47.61092265963234]
We show that implicit neural scene representations can be leveraged to perform per-point semantic segmentation.
Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks.
We explore two novel applications for this semantically aware implicit neural scene representation.
arXiv Detail & Related papers (2020-03-28T00:43:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.