Deep View Synthesis via Self-Consistent Generative Network
- URL: http://arxiv.org/abs/2101.10844v1
- Date: Tue, 19 Jan 2021 10:56:00 GMT
- Title: Deep View Synthesis via Self-Consistent Generative Network
- Authors: Zhuoman Liu, Wei Jia, Ming Yang, Peiyao Luo, Yong Guo, and Mingkui Tan
- Abstract summary: View synthesis aims to produce unseen views from a set of views captured by two or more cameras at different positions.
To address this issue, most existing methods seek to exploit the geometric information to match pixels.
We propose a novel deep generative model, called Self-Consistent Generative Network (SCGN), which synthesizes novel views without explicitly exploiting the geometric information.
- Score: 41.34461086700849
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: View synthesis aims to produce unseen views from a set of views captured by
two or more cameras at different positions. This task is non-trivial since it
is hard to conduct pixel-level matching among different views. To address this
issue, most existing methods seek to exploit the geometric information to match
pixels. However, when the distinct cameras have a large baseline (i.e., far
away from each other), severe geometry distortion issues would occur and the
geometric information may fail to provide useful guidance, resulting in very
blurry synthesized images. To address the above issues, in this paper, we
propose a novel deep generative model, called Self-Consistent Generative
Network (SCGN), which synthesizes novel views from the given input views
without explicitly exploiting the geometric information. The proposed SCGN
model consists of two main components, i.e., a View Synthesis Network (VSN) and
a View Decomposition Network (VDN), both employing an Encoder-Decoder
structure. Here, the VDN seeks to reconstruct input views from the synthesized
novel view to preserve the consistency of view synthesis. Thanks to VDN, SCGN
is able to synthesize novel views without using any geometric rectification
before encoding, making it easier for both training and applications. Finally,
adversarial loss is introduced to improve the photo-realism of novel views.
Both qualitative and quantitative comparisons against several state-of-the-art
methods on two benchmark tasks demonstrated the superiority of our approach.
Related papers
- GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping [47.38125925469167]
We propose a semantic-preserving generative warping framework to generate novel views from a single image.
Our approach addresses the limitations of existing methods by conditioning the generative model on source view images.
Our model outperforms existing methods in both in-domain and out-of-domain scenarios.
arXiv Detail & Related papers (2024-05-27T15:07:04Z) - G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images [45.66479596827045]
We propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach.
To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach.
arXiv Detail & Related papers (2024-04-11T04:58:18Z) - Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models [16.326276673056334]
Consistent-1-to-3 is a generative framework that significantly mitigates this issue.
We decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.
We propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view information.
arXiv Detail & Related papers (2023-10-04T17:58:57Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - Geometry-biased Transformers for Novel View Synthesis [36.11342728319563]
We tackle the task of synthesizing novel views of an object given a few input images and associated camera viewpoints.
Our work is inspired by recent 'geometry-free' approaches where multi-view images are encoded as a (global) set-latent representation.
We propose 'Geometry-biased Transformers' (GBTs) that incorporate geometric inductive biases in the set-latent representation-based inference.
arXiv Detail & Related papers (2023-01-11T18:59:56Z) - Novel View Synthesis from a Single Image via Unsupervised learning [27.639536023956122]
We propose an unsupervised network to learn such a pixel transformation from a single source viewpoint.
The learned transformation allows us to synthesize a novel view from any single source viewpoint image of unknown pose.
arXiv Detail & Related papers (2021-10-29T06:32:49Z) - Self-Supervised Visibility Learning for Novel View Synthesis [79.53158728483375]
Conventional rendering methods estimate scene geometry and synthesize novel views in two separate steps.
We propose an end-to-end NVS framework to eliminate the error propagation issue.
Our network is trained in an end-to-end self-supervised fashion, thus significantly alleviating error accumulation in view synthesis.
arXiv Detail & Related papers (2021-03-29T08:11:25Z) - Street-view Panoramic Video Synthesis from a Single Satellite Image [92.26826861266784]
We present a novel method for synthesizing both temporally and geometrically consistent street-view panoramic video.
Existing cross-view synthesis approaches focus more on images, while video synthesis in such a case has not yet received enough attention.
arXiv Detail & Related papers (2020-12-11T20:22:38Z) - Free View Synthesis [100.86844680362196]
We present a method for novel view synthesis from input images that are freely distributed around a scene.
Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
arXiv Detail & Related papers (2020-08-12T18:16:08Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.