Control3Diff: Learning Controllable 3D Diffusion Models from Single-view
Images
- URL: http://arxiv.org/abs/2304.06700v2
- Date: Thu, 26 Oct 2023 05:04:17 GMT
- Title: Control3Diff: Learning Controllable 3D Diffusion Models from Single-view
Images
- Authors: Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu and
Josh Susskind
- Abstract summary: Control3Diff is a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis.
We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet.
- Score: 70.17085345196583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently become the de-facto approach for generative
modeling in the 2D domain. However, extending diffusion models to 3D is
challenging due to the difficulties in acquiring 3D ground truth data for
training. On the other hand, 3D GANs that integrate implicit 3D representations
into GANs have shown remarkable 3D-aware generation when trained only on
single-view image datasets. However, 3D GANs do not provide straightforward
ways to precisely control image synthesis. To address these challenges, We
present Control3Diff, a 3D diffusion model that combines the strengths of
diffusion models and 3D GANs for versatile, controllable 3D-aware image
synthesis for single-view datasets. Control3Diff explicitly models the
underlying latent distribution (optionally conditioned on external inputs),
thus enabling direct control during the diffusion process. Moreover, our
approach is general and applicable to any type of controlling input, allowing
us to train it with the same diffusion objective without any auxiliary
supervision. We validate the efficacy of Control3Diff on standard image
generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various
conditioning inputs such as images, sketches, and text prompts. Please see the
project website (\url{https://jiataogu.me/control3diff}) for video comparisons.
Related papers
- DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation [33.62074896816882]
DiffSplat is a novel 3D generative framework that generates 3D Gaussian splats by taming large-scale text-to-image diffusion models.
It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model.
In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views.
arXiv Detail & Related papers (2025-01-28T07:38:59Z) - Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation [66.75243908044538]
We introduce Zero-1-to-G, a novel approach to direct 3D generation on Gaussian splats using pretrained 2D diffusion models.
To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats.
This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects.
arXiv Detail & Related papers (2025-01-09T18:37:35Z) - GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space [64.82017974849697]
We train a feed-forward text-to-3D diffusion generator for human characters using only single-view 2D data for supervision.
GANFusion starts by generating unconditional triplane features for 3D data using a GAN architecture trained with only single-view 2D data.
arXiv Detail & Related papers (2024-12-21T17:59:17Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - 3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields.
Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields.
We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.