Self-Supervised Visibility Learning for Novel View Synthesis
- URL: http://arxiv.org/abs/2103.15407v1
- Date: Mon, 29 Mar 2021 08:11:25 GMT
- Title: Self-Supervised Visibility Learning for Novel View Synthesis
- Authors: Yujiao Shi, Hongdong Li, Xin Yu
- Abstract summary: Conventional rendering methods estimate scene geometry and synthesize novel views in two separate steps.
We propose an end-to-end NVS framework to eliminate the error propagation issue.
Our network is trained in an end-to-end self-supervised fashion, thus significantly alleviating error accumulation in view synthesis.
- Score: 79.53158728483375
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We address the problem of novel view synthesis (NVS) from a few sparse source
view images. Conventional image-based rendering methods estimate scene geometry
and synthesize novel views in two separate steps. However, erroneous geometry
estimation will decrease NVS performance as view synthesis highly depends on
the quality of estimated scene geometry. In this paper, we propose an
end-to-end NVS framework to eliminate the error propagation issue. To be
specific, we construct a volume under the target view and design a source-view
visibility estimation (SVE) module to determine the visibility of the
target-view voxels in each source view. Next, we aggregate the visibility of
all source views to achieve a consensus volume. Each voxel in the consensus
volume indicates a surface existence probability. Then, we present a soft
ray-casting (SRC) mechanism to find the most front surface in the target view
(i.e. depth). Specifically, our SRC traverses the consensus volume along
viewing rays and then estimates a depth probability distribution. We then warp
and aggregate source view pixels to synthesize a novel view based on the
estimated source-view visibility and target-view depth. At last, our network is
trained in an end-to-end self-supervised fashion, thus significantly
alleviating error accumulation in view synthesis. Experimental results
demonstrate that our method generates novel views in higher quality compared to
the state-of-the-art.
Related papers
- CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency [18.101763989542828]
We propose a simple yet effective method that explicitly builds depth-aware consistency across input views.
Our key insight is that by forcing the same spatial points to be sampled repeatedly in different input views, we are able to strengthen the interactions between views.
Although simple, extensive experiments demonstrate that our proposed method can achieve better synthesis quality over state-of-the-art methods.
arXiv Detail & Related papers (2024-02-26T09:04:04Z) - Enhanced Stable View Synthesis [86.69338893753886]
We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera.
The introduced approach focuses on outdoor scenes where recovering accurate geometric scaffold and camera pose is challenging.
arXiv Detail & Related papers (2023-03-30T01:53:14Z) - ProbNVS: Fast Novel View Synthesis with Learned Probability-Guided
Sampling [42.37704606186928]
We propose to build a novel view synthesis framework based on learned MVS priors.
We show that our method achieves 15 to 40 times faster rendering compared to state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-07T14:45:42Z) - Content-aware Warping for View Synthesis [110.54435867693203]
We propose content-aware warping, which adaptively learns the weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network.
Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from two source views.
Experimental results on structured light field datasets with wide baselines and unstructured multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-01-22T11:35:05Z) - NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View
Synthesis [74.4983052902396]
We propose a novel training method split in three main steps to improve monocular depth estimation.
Experimental results prove that our method achieves state-of-the-art or comparable performance on the KITTI and NYU-Depth-v2 datasets.
arXiv Detail & Related papers (2021-12-22T12:21:08Z) - Novel View Synthesis from a Single Image via Unsupervised learning [27.639536023956122]
We propose an unsupervised network to learn such a pixel transformation from a single source viewpoint.
The learned transformation allows us to synthesize a novel view from any single source viewpoint image of unknown pose.
arXiv Detail & Related papers (2021-10-29T06:32:49Z) - Deep View Synthesis via Self-Consistent Generative Network [41.34461086700849]
View synthesis aims to produce unseen views from a set of views captured by two or more cameras at different positions.
To address this issue, most existing methods seek to exploit the geometric information to match pixels.
We propose a novel deep generative model, called Self-Consistent Generative Network (SCGN), which synthesizes novel views without explicitly exploiting the geometric information.
arXiv Detail & Related papers (2021-01-19T10:56:00Z) - Stable View Synthesis [100.86844680362196]
We present Stable View Synthesis (SVS)
Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene.
SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets.
arXiv Detail & Related papers (2020-11-14T07:24:43Z) - Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths
from a Monocular Camera [93.04135520894631]
This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene.
A key challenge for the novel view synthesis arises from dynamic scene reconstruction where epipolar geometry does not apply to the local motion of dynamic contents.
To address this challenge, we propose to combine the depth from single view (DSV) and the depth from multi-view stereo (DMV), where DSV is complete, i.e., a depth is assigned to every pixel, yet view-variant in its scale, while DMV is view-invariant yet incomplete.
arXiv Detail & Related papers (2020-04-02T22:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.