Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views
of Novel Scenes
- URL: http://arxiv.org/abs/2104.06935v1
- Date: Wed, 14 Apr 2021 15:38:57 GMT
- Title: Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views
of Novel Scenes
- Authors: Julian Chibane, Aayush Bansal, Verica Lazova, Gerard Pons-Moll
- Abstract summary: We introduce Stereo Radiance Fields (SRF), a neural view synthesis approach that is trained end-to-end.
SRF generalizes to new scenes, and requires only sparse views at test time.
Experiments show that SRF learns structure instead of overfitting on a scene.
- Score: 48.0304999503795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent neural view synthesis methods have achieved impressive quality and
realism, surpassing classical pipelines which rely on multi-view
reconstruction. State-of-the-Art methods, such as NeRF, are designed to learn a
single scene with a neural network and require dense multi-view inputs. Testing
on a new scene requires re-training from scratch, which takes 2-3 days. In this
work, we introduce Stereo Radiance Fields (SRF), a neural view synthesis
approach that is trained end-to-end, generalizes to new scenes, and requires
only sparse views at test time. The core idea is a neural architecture inspired
by classical multi-view stereo methods, which estimates surface points by
finding similar image regions in stereo images. In SRF, we predict color and
density for each 3D point given an encoding of its stereo correspondence in the
input images. The encoding is implicitly learned by an ensemble of pair-wise
similarities -- emulating classical stereo. Experiments show that SRF learns
structure instead of overfitting on a scene. We train on multiple scenes of the
DTU dataset and generalize to new ones without re-training, requiring only 10
sparse and spread-out views as input. We show that 10-15 minutes of fine-tuning
further improve the results, achieving significantly sharper, more detailed
results than scene-specific models. The code, model, and videos are available
at https://virtualhumans.mpi-inf.mpg.de/srf/.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - 3D Reconstruction with Generalizable Neural Fields using Scene Priors [71.37871576124789]
We introduce training generalizable Neural Fields incorporating scene Priors (NFPs)
The NFP network maps any single-view RGB-D image into signed distance and radiance values.
A complete scene can be reconstructed by merging individual frames in the volumetric space WITHOUT a fusion module.
arXiv Detail & Related papers (2023-09-26T18:01:02Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Remote Sensing Novel View Synthesis with Implicit Multiplane
Representations [26.33490094119609]
We propose a novel remote sensing view synthesis method by leveraging the recent advances in implicit neural representations.
Considering the overhead and far depth imaging of remote sensing images, we represent the 3D space by combining implicit multiplane images (MPI) representation and deep neural networks.
Images from any novel views can be freely rendered on the basis of the reconstructed model.
arXiv Detail & Related papers (2022-05-18T13:03:55Z) - ViewFormer: NeRF-free Neural Rendering from Few Images Using
Transformers [34.4824364161812]
Novel view synthesis is a problem where we are given only a few context views sparsely covering a scene or an object.
The goal is to predict novel viewpoints in the scene, which requires learning priors.
We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
arXiv Detail & Related papers (2022-03-18T21:08:23Z) - NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis [28.83180559337126]
We propose a novel network that can recover 3D scene geometry as a distance function, together with high-resolution color images.
Our method uses only a sparse set of images as input and can generalize well to novel scenes.
arXiv Detail & Related papers (2021-08-09T08:59:24Z) - Neural Rays for Occlusion-aware Image-based Rendering [108.34004858785896]
We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis (NVS) task with multi-view images as input.
NeuRay can quickly generate high-quality novel view rendering images of unseen scenes with little finetuning.
arXiv Detail & Related papers (2021-07-28T15:09:40Z) - Neural Body: Implicit Neural Representations with Structured Latent
Codes for Novel View Synthesis of Dynamic Humans [56.63912568777483]
This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.
We propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh.
Experiments on ZJU-MoCap show that our approach outperforms prior works by a large margin in terms of novel view synthesis quality.
arXiv Detail & Related papers (2020-12-31T18:55:38Z) - pixelNeRF: Neural Radiance Fields from One or Few Images [20.607712035278315]
pixelNeRF is a learning framework that predicts a continuous neural scene representation conditioned on one or few input images.
We conduct experiments on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects.
In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction.
arXiv Detail & Related papers (2020-12-03T18:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.