ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
- URL: http://arxiv.org/abs/2312.01305v1
- Date: Sun, 3 Dec 2023 06:50:15 GMT
- Title: ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
- Authors: Jeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang
Moo Yi
- Abstract summary: Generating novel views of an object from a single image is a challenging task.
Recent methods for view synthesis based on diffusion have shown great progress.
We demonstrate a simple method, where we utilize a pre-trained video diffusion model.
- Score: 33.760292331843104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating novel views of an object from a single image is a challenging
task. It requires an understanding of the underlying 3D structure of the object
from an image and rendering high-quality, spatially consistent new views. While
recent methods for view synthesis based on diffusion have shown great progress,
achieving consistency among various view estimates and at the same time abiding
by the desired camera pose remains a critical problem yet to be solved. In this
work, we demonstrate a strikingly simple method, where we utilize a pre-trained
video diffusion model to solve this problem. Our key idea is that synthesizing
a novel view could be reformulated as synthesizing a video of a camera going
around the object of interest -- a scanning video -- which then allows us to
leverage the powerful priors that a video diffusion model would have learned.
Thus, to perform novel-view synthesis, we create a smooth camera trajectory to
the target view that we wish to render, and denoise using both a
view-conditioned diffusion model and a video diffusion model. By doing so, we
obtain a highly consistent novel view synthesis, outperforming the state of the
art.
Related papers
- ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - FSViewFusion: Few-Shots View Generation of Novel Objects [75.81872204650807]
We introduce a pretrained stable diffusion model for view synthesis without explicit 3D priors.
Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots.
We establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.
arXiv Detail & Related papers (2024-03-11T02:59:30Z) - iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse
Views [61.707755434165335]
iFusion is a novel 3D object reconstruction framework that requires only two views with unknown camera poses.
We harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects.
Experiments demonstrate strong performance in both pose estimation and novel view synthesis.
arXiv Detail & Related papers (2023-12-28T18:59:57Z) - DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model
Given Sparse Views [20.685453627120832]
Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings.
DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images.
arXiv Detail & Related papers (2023-06-06T05:26:26Z) - Consistent View Synthesis with Pose-Guided Diffusion Models [51.37925069307313]
Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications.
We propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image.
arXiv Detail & Related papers (2023-03-30T17:59:22Z) - Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.