SparseGNV: Generating Novel Views of Indoor Scenes with Sparse Input
Views
- URL: http://arxiv.org/abs/2305.07024v1
- Date: Thu, 11 May 2023 17:58:37 GMT
- Title: SparseGNV: Generating Novel Views of Indoor Scenes with Sparse Input
Views
- Authors: Weihao Cheng, Yan-Pei Cao, Ying Shan
- Abstract summary: We present SparseGNV, a learning framework that incorporates 3D structures and image generative models to generate novel views.
SparseGNV is trained across a large indoor scene dataset to learn generalizable priors.
It can efficiently generate novel views of an unseen indoor scene in a feed-forward manner.
- Score: 16.72880076920758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study to generate novel views of indoor scenes given sparse input views.
The challenge is to achieve both photorealism and view consistency. We present
SparseGNV: a learning framework that incorporates 3D structures and image
generative models to generate novel views with three modules. The first module
builds a neural point cloud as underlying geometry, providing contextual
information and guidance for the target novel view. The second module utilizes
a transformer-based network to map the scene context and the guidance into a
shared latent space and autoregressively decodes the target view in the form of
discrete image tokens. The third module reconstructs the tokens into the image
of the target view. SparseGNV is trained across a large indoor scene dataset to
learn generalizable priors. Once trained, it can efficiently generate novel
views of an unseen indoor scene in a feed-forward manner. We evaluate SparseGNV
on both real-world and synthetic indoor scenes and demonstrate that it
outperforms state-of-the-art methods based on either neural radiance fields or
conditional image generation.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - ViewFormer: NeRF-free Neural Rendering from Few Images Using
Transformers [34.4824364161812]
Novel view synthesis is a problem where we are given only a few context views sparsely covering a scene or an object.
The goal is to predict novel viewpoints in the scene, which requires learning priors.
We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
arXiv Detail & Related papers (2022-03-18T21:08:23Z) - pixelNeRF: Neural Radiance Fields from One or Few Images [20.607712035278315]
pixelNeRF is a learning framework that predicts a continuous neural scene representation conditioned on one or few input images.
We conduct experiments on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects.
In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction.
arXiv Detail & Related papers (2020-12-03T18:59:54Z) - Continuous Object Representation Networks: Novel View Synthesis without
Target View Supervision [26.885846254261626]
Continuous Object Representation Networks (CORN) is a conditional architecture that encodes an input image's geometry and appearance that map to a 3D consistent scene representation.
CORN achieves well on challenging tasks such as novel view synthesis and single-view 3D reconstruction and performance comparable to state-of-the-art approaches that use direct supervision.
arXiv Detail & Related papers (2020-07-30T17:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.