DiffDreamer: Towards Consistent Unsupervised Single-view Scene
Extrapolation with Conditional Diffusion Models
- URL: http://arxiv.org/abs/2211.12131v2
- Date: Sat, 18 Mar 2023 16:07:21 GMT
- Title: DiffDreamer: Towards Consistent Unsupervised Single-view Scene
Extrapolation with Conditional Diffusion Models
- Authors: Shengqu Cai, Eric Ryan Chan, Songyou Peng, Mohamad Shahbazi, Anton
Obukhov, Luc Van Gool and Gordon Wetzstein
- Abstract summary: DiffDreamer is an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory.
We show that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving consistency significantly better than prior GAN-based methods.
- Score: 91.94566873400277
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Scene extrapolation -- the idea of generating novel views by flying into a
given image -- is a promising, yet challenging task. For each predicted frame,
a joint inpainting and 3D refinement problem has to be solved, which is ill
posed and includes a high level of ambiguity. Moreover, training data for
long-range scenes is difficult to obtain and usually lacks sufficient views to
infer accurate camera poses. We introduce DiffDreamer, an unsupervised
framework capable of synthesizing novel views depicting a long camera
trajectory while training solely on internet-collected images of nature scenes.
Utilizing the stochastic nature of the guided denoising steps, we train the
diffusion models to refine projected RGBD images but condition the denoising
steps on multiple past and future frames for inference. We demonstrate that
image-conditioned diffusion models can effectively perform long-range scene
extrapolation while preserving consistency significantly better than prior
GAN-based methods. DiffDreamer is a powerful and efficient solution for scene
extrapolation, producing impressive results despite limited supervision.
Project page: https://primecai.github.io/diffdreamer.
Related papers
- MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z) - DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis [18.64688172651478]
We present DiffPortrait3D, a conditional diffusion model capable of synthesizing 3D-consistent photo-realistic novel views.
Given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views.
We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.
arXiv Detail & Related papers (2023-12-20T13:31:11Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - DiffuVST: Narrating Fictional Scenes with Global-History-Guided
Denoising Models [6.668241588219693]
Visual storytelling is increasingly desired beyond real-world imagery.
Current techniques, which typically use autoregressive decoders, suffer from low inference speed and are not well-suited for synthetic scenes.
We propose a novel diffusion-based system DiffuVST, which models a series of visual descriptions as a single conditional denoising process.
arXiv Detail & Related papers (2023-12-12T08:40:38Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Diffusion-based Generation, Optimization, and Planning in 3D Scenes [89.63179422011254]
We introduce SceneDiffuser, a conditional generative model for 3D scene understanding.
SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented.
We show significant improvements compared with previous models.
arXiv Detail & Related papers (2023-01-15T03:43:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.