Related papers: Sequential View Synthesis with Transformer

Sequential View Synthesis with Transformer

URL: http://arxiv.org/abs/2004.04548v2
Date: Tue, 22 Sep 2020 08:53:28 GMT
Title: Sequential View Synthesis with Transformer
Authors: Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila
Abstract summary: We introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations. We evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning.
Score: 13.200139959163574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the problem of novel view synthesis by means of neural rendering, where we are interested in predicting the novel view at an arbitrary camera pose based on a given set of input images from other viewpoints. Using the known query pose and input poses, we create an ordered set of observations that leads to the target view. Thus, the problem of single novel view synthesis is reformulated as a sequential view prediction task. In this paper, the proposed Transformer-based Generative Query Network (T-GQN) extends the neural-rendering methods by adding two new concepts. First, we use multi-view attention learning between context images to obtain multiple implicit scene representations. Second, we introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations. Finally, we evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning.

Related papers

AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction [69.65671384868344]
We propose AR-1-to-3, a novel next-view prediction paradigm based on diffusion models. We show that our method significantly improves the consistency between the generated views and the input views, producing high-fidelity 3D assets.
arXiv Detail & Related papers (2025-03-17T08:39:10Z)
UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images. We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z)
Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision. The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z)
im2nerf: Image to Neural Radiance Field in the Wild [47.18702901448768]
im2nerf is a learning framework that predicts a continuous neural object representation given a single input image in the wild. We show that im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
arXiv Detail & Related papers (2022-09-08T23:28:56Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input. Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z)
Novel View Synthesis from a Single Image via Unsupervised learning [27.639536023956122]
We propose an unsupervised network to learn such a pixel transformation from a single source viewpoint. The learned transformation allows us to synthesize a novel view from any single source viewpoint image of unknown pose.
arXiv Detail & Related papers (2021-10-29T06:32:49Z)
Deep Learning based Novel View Synthesis [18.363945964373553]
We propose a deep convolutional neural network (CNN) which learns to predict novel views of a scene from given collection of images. In comparison to prior deep learning based approaches, which can handle only a fixed number of input images to predict novel view, proposed approach works with different numbers of input images.
arXiv Detail & Related papers (2021-07-14T16:15:36Z)
Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image. We first infer a volumetric representation in a canonical frame, along with the camera pose. The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z)
Unsupervised Novel View Synthesis from a Single Image [47.37120753568042]
Novel view synthesis from a single image aims at generating novel views from a single input image of an object. This work aims at relaxing this assumption enabling training of conditional generative model for novel view synthesis in a completely unsupervised manner.
arXiv Detail & Related papers (2021-02-05T16:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.