Control3Diff: Learning Controllable 3D Diffusion Models from Single-view
Images
- URL: http://arxiv.org/abs/2304.06700v2
- Date: Thu, 26 Oct 2023 05:04:17 GMT
- Title: Control3Diff: Learning Controllable 3D Diffusion Models from Single-view
Images
- Authors: Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu and
Josh Susskind
- Abstract summary: Control3Diff is a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis.
We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet.
- Score: 70.17085345196583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently become the de-facto approach for generative
modeling in the 2D domain. However, extending diffusion models to 3D is
challenging due to the difficulties in acquiring 3D ground truth data for
training. On the other hand, 3D GANs that integrate implicit 3D representations
into GANs have shown remarkable 3D-aware generation when trained only on
single-view image datasets. However, 3D GANs do not provide straightforward
ways to precisely control image synthesis. To address these challenges, We
present Control3Diff, a 3D diffusion model that combines the strengths of
diffusion models and 3D GANs for versatile, controllable 3D-aware image
synthesis for single-view datasets. Control3Diff explicitly models the
underlying latent distribution (optionally conditioned on external inputs),
thus enabling direct control during the diffusion process. Moreover, our
approach is general and applicable to any type of controlling input, allowing
us to train it with the same diffusion objective without any auxiliary
supervision. We validate the efficacy of Control3Diff on standard image
generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various
conditioning inputs such as images, sketches, and text prompts. Please see the
project website (\url{https://jiataogu.me/control3diff}) for video comparisons.
Related papers
- 3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance [61.06034736050515]
We introduce a method capable of generating camera-controlled viewpoints from a single input image.
Our method excels in handling complex and diverse scenes without extensive training or additional 3D and multiview data.
arXiv Detail & Related papers (2024-08-12T13:53:40Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models [20.084928490309313]
This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models.
By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model.
The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds.
arXiv Detail & Related papers (2024-03-18T17:59:12Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - 3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields.
Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields.
We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z) - Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.