GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D
Scene Understanding
- URL: http://arxiv.org/abs/2403.03608v1
- Date: Wed, 6 Mar 2024 10:55:50 GMT
- Title: GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D
Scene Understanding
- Authors: Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu, Yu-Chiang Frank Wang
- Abstract summary: We introduce a Generalizable Semantic Neural Radiance Field (GSNeRF), which takes image semantics into the synthesis process.
Our GSNeRF is composed of two stages: Semantic Geo-Reasoning and Depth-Guided Visual rendering.
- Score: 30.951440204237166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Utilizing multi-view inputs to synthesize novel-view images, Neural Radiance
Fields (NeRF) have emerged as a popular research topic in 3D vision. In this
work, we introduce a Generalizable Semantic Neural Radiance Field (GSNeRF),
which uniquely takes image semantics into the synthesis process so that both
novel view images and the associated semantic maps can be produced for unseen
scenes. Our GSNeRF is composed of two stages: Semantic Geo-Reasoning and
Depth-Guided Visual rendering. The former is able to observe multi-view image
inputs to extract semantic and geometry features from a scene. Guided by the
resulting image geometry information, the latter performs both image and
semantic rendering with improved performances. Our experiments not only confirm
that GSNeRF performs favorably against prior works on both novel-view image and
semantic segmentation synthesis but the effectiveness of our sampling strategy
for visual rendering is further verified.
Related papers
- MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors [11.118490283303407]
We propose a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D.
Our method produces accurate semantics and geometry in both 3D and 2D space.
arXiv Detail & Related papers (2024-09-21T05:12:13Z) - Semantically-aware Neural Radiance Fields for Visual Scene
Understanding: A Comprehensive Review [26.436253160392123]
Review thoroughly examines the role of semantically-aware Neural Radiance Fields (NeRFs) in visual scene understanding.
NeRFs adeptly infer 3D representations for both stationary and dynamic objects in a scene.
arXiv Detail & Related papers (2024-02-17T00:15:09Z) - HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided
Neural Radiance Fields for Sparse View Inputs [7.715395970689711]
We introduce Hierarchical Geometric, Semantic, and Photometric Guided NeRF (HG3-NeRF)
HG3-NeRF is a novel methodology that can address the limitation and enhance consistency of geometry, semantic content, and appearance across different views.
Experimental results demonstrate that HG3-NeRF can outperform other state-of-the-art methods on different standard benchmarks.
arXiv Detail & Related papers (2024-01-22T06:28:08Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints.
In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields.
We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z) - SegNeRF: 3D Part Segmentation with Neural Radiance Fields [63.12841224024818]
SegNeRF is a neural field representation that integrates a semantic field along with the usual radiance field.
SegNeRF is capable of simultaneously predicting geometry, appearance, and semantic information from posed images, even for unseen objects.
SegNeRF is able to generate an explicit 3D model from a single image of an object taken in the wild, with its corresponding part segmentation.
arXiv Detail & Related papers (2022-11-21T07:16:03Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Semantic View Synthesis [56.47999473206778]
We tackle a new problem of semantic view synthesis -- generating free-viewpoint rendering of a synthesized scene using a semantic label map as input.
First, we focus on synthesizing the color and depth of the visible surface of the 3D scene.
We then use the synthesized color and depth to impose explicit constraints on the multiple-plane image (MPI) representation prediction process.
arXiv Detail & Related papers (2020-08-24T17:59:46Z) - NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [78.5281048849446]
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes.
Our algorithm represents a scene using a fully-connected (non-convolutional) deep network.
Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses.
arXiv Detail & Related papers (2020-03-19T17:57:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.