Sequential View Synthesis with Transformer
- URL: http://arxiv.org/abs/2004.04548v2
- Date: Tue, 22 Sep 2020 08:53:28 GMT
- Title: Sequential View Synthesis with Transformer
- Authors: Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila
- Abstract summary: We introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations.
We evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning.
- Score: 13.200139959163574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of novel view synthesis by means of neural
rendering, where we are interested in predicting the novel view at an arbitrary
camera pose based on a given set of input images from other viewpoints. Using
the known query pose and input poses, we create an ordered set of observations
that leads to the target view. Thus, the problem of single novel view synthesis
is reformulated as a sequential view prediction task. In this paper, the
proposed Transformer-based Generative Query Network (T-GQN) extends the
neural-rendering methods by adding two new concepts. First, we use multi-view
attention learning between context images to obtain multiple implicit scene
representations. Second, we introduce a sequential rendering decoder to predict
an image sequence, including the target view, based on the learned
representations. Finally, we evaluate our model on various challenging datasets
and demonstrate that our model not only gives consistent predictions but also
doesn't require any retraining for finetuning.
Related papers
- UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - im2nerf: Image to Neural Radiance Field in the Wild [47.18702901448768]
im2nerf is a learning framework that predicts a continuous neural object representation given a single input image in the wild.
We show that im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
arXiv Detail & Related papers (2022-09-08T23:28:56Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Novel View Synthesis from a Single Image via Unsupervised learning [27.639536023956122]
We propose an unsupervised network to learn such a pixel transformation from a single source viewpoint.
The learned transformation allows us to synthesize a novel view from any single source viewpoint image of unknown pose.
arXiv Detail & Related papers (2021-10-29T06:32:49Z) - Deep Learning based Novel View Synthesis [18.363945964373553]
We propose a deep convolutional neural network (CNN) which learns to predict novel views of a scene from given collection of images.
In comparison to prior deep learning based approaches, which can handle only a fixed number of input images to predict novel view, proposed approach works with different numbers of input images.
arXiv Detail & Related papers (2021-07-14T16:15:36Z) - Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image.
We first infer a volumetric representation in a canonical frame, along with the camera pose.
The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z) - Unsupervised Novel View Synthesis from a Single Image [47.37120753568042]
Novel view synthesis from a single image aims at generating novel views from a single input image of an object.
This work aims at relaxing this assumption enabling training of conditional generative model for novel view synthesis in a completely unsupervised manner.
arXiv Detail & Related papers (2021-02-05T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.