Related papers: ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

URL: http://arxiv.org/abs/2203.10157v1
Date: Fri, 18 Mar 2022 21:08:23 GMT
Title: ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers
Authors: Jon\'a\v{s} Kulh\'anek and Erik Derner and Torsten Sattler and Robert Babu\v{s}ka
Abstract summary: Novel view synthesis is a problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
Score: 34.4824364161812
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Novel view synthesis is a long-standing problem. In this work, we consider a variant of the problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. The current state of the art is based on Neural Radiance Fields (NeRFs), and while achieving impressive results, the methods suffer from long training times as they require evaluating thousands of 3D point samples via a deep neural network for each image. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network. Our model uses a two-stage architecture consisting of a codebook and a transformer model. The codebook is used to embed individual images into a smaller latent space, and the transformer solves the view synthesis task in this more compact space. To train our model efficiently, we introduce a novel branching attention mechanism that allows us to use the same model not only for neural rendering but also for camera pose estimation. Experimental results on real-world scenes show that our approach is competitive compared to NeRF-based methods while not reasoning in 3D, and it is faster to train.

Related papers

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z)
Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes. First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes. Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z)
HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z)
A Comparison of Tiny-nerf versus Spatial Representations for 3d Reconstruction [0.0]
We perform a comparison between a trending neural rendering, called tiny-NeRF, and other volume representations that are commonly used as maps in robotics. Experiments show that tiny-NeRF requires three times less memory space compared to other representations. In terms of processing time, tiny-NeRF takes about six times more to compute the model.
arXiv Detail & Related papers (2023-01-27T03:55:36Z)
One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics. Each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes [48.0304999503795]
We introduce Stereo Radiance Fields (SRF), a neural view synthesis approach that is trained end-to-end. SRF generalizes to new scenes, and requires only sparse views at test time. Experiments show that SRF learns structure instead of overfitting on a scene.
arXiv Detail & Related papers (2021-04-14T15:38:57Z)
pixelNeRF: Neural Radiance Fields from One or Few Images [20.607712035278315]
pixelNeRF is a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. We conduct experiments on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects. In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction.
arXiv Detail & Related papers (2020-12-03T18:59:54Z)
Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision [26.885846254261626]
Continuous Object Representation Networks (CORN) is a conditional architecture that encodes an input image's geometry and appearance that map to a 3D consistent scene representation. CORN achieves well on challenging tasks such as novel view synthesis and single-view 3D reconstruction and performance comparable to state-of-the-art approaches that use direct supervision.
arXiv Detail & Related papers (2020-07-30T17:49:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.