Long-Term Photometric Consistent Novel View Synthesis with Diffusion
Models
- URL: http://arxiv.org/abs/2304.10700v2
- Date: Mon, 21 Aug 2023 19:01:42 GMT
- Title: Long-Term Photometric Consistent Novel View Synthesis with Diffusion
Models
- Authors: Jason J. Yu, Fereshteh Forghani, Konstantinos G. Derpanis, Marcus A.
Brubaker
- Abstract summary: We propose a novel generative model capable of producing a sequence of photorealistic images consistent with a specified camera trajectory.
To measure the consistency over a sequence of generated views, we introduce a new metric, the thresholded symmetric epipolar distance (TSED)
- Score: 24.301334966272297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel view synthesis from a single input image is a challenging task, where
the goal is to generate a new view of a scene from a desired camera pose that
may be separated by a large motion. The highly uncertain nature of this
synthesis task due to unobserved elements within the scene (i.e. occlusion) and
outside the field-of-view makes the use of generative models appealing to
capture the variety of possible outputs. In this paper, we propose a novel
generative model capable of producing a sequence of photorealistic images
consistent with a specified camera trajectory, and a single starting image. Our
approach is centred on an autoregressive conditional diffusion-based model
capable of interpolating visible scene elements, and extrapolating unobserved
regions in a view, in a geometrically consistent manner. Conditioning is
limited to an image capturing a single camera view and the (relative) pose of
the new camera view. To measure the consistency over a sequence of generated
views, we introduce a new metric, the thresholded symmetric epipolar distance
(TSED), to measure the number of consistent frame pairs in a sequence. While
previous methods have been shown to produce high quality images and consistent
semantics across pairs of views, we show empirically with our metric that they
are often inconsistent with the desired camera poses. In contrast, we
demonstrate that our method produces both photorealistic and view-consistent
imagery.
Related papers
- MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis [18.64688172651478]
We present DiffPortrait3D, a conditional diffusion model capable of synthesizing 3D-consistent photo-realistic novel views.
Given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views.
We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.
arXiv Detail & Related papers (2023-12-20T13:31:11Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Consistent View Synthesis with Pose-Guided Diffusion Models [51.37925069307313]
Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications.
We propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image.
arXiv Detail & Related papers (2023-03-30T17:59:22Z) - Zero-1-to-3: Zero-shot One Image to 3D Object [30.455300183998247]
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint.
Our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.
arXiv Detail & Related papers (2023-03-20T17:59:50Z) - MELON: NeRF with Unposed Images in SO(3) [35.093700416540436]
We show that a neural network can reconstruct a neural radiance field from unposed images with state-of-the-art accuracy while requiring ten times fewer views than adversarial approaches.
Using a neural-network to regularize pose estimation, we demonstrate that our method can reconstruct a neural radiance field from unposed images with state-of-the-art accuracy while requiring ten times fewer views than adversarial approaches.
arXiv Detail & Related papers (2023-03-14T17:33:39Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.