DreamFusion: Text-to-3D using 2D Diffusion
- URL: http://arxiv.org/abs/2209.14988v1
- Date: Thu, 29 Sep 2022 17:50:40 GMT
- Title: DreamFusion: Text-to-3D using 2D Diffusion
- Authors: Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall
- Abstract summary: Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
- Score: 52.52529213936283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent breakthroughs in text-to-image synthesis have been driven by diffusion
models trained on billions of image-text pairs. Adapting this approach to 3D
synthesis would require large-scale datasets of labeled 3D data and efficient
architectures for denoising 3D data, neither of which currently exist. In this
work, we circumvent these limitations by using a pretrained 2D text-to-image
diffusion model to perform text-to-3D synthesis. We introduce a loss based on
probability density distillation that enables the use of a 2D diffusion model
as a prior for optimization of a parametric image generator. Using this loss in
a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a
Neural Radiance Field, or NeRF) via gradient descent such that its 2D
renderings from random angles achieve a low loss. The resulting 3D model of the
given text can be viewed from any angle, relit by arbitrary illumination, or
composited into any 3D environment. Our approach requires no 3D training data
and no modifications to the image diffusion model, demonstrating the
effectiveness of pretrained image diffusion models as priors.
Related papers
- Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild.
Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture.
We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z) - GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding [16.50466940644004]
We present Isotropic3D, an image-to-3D generation pipeline that takes only an image CLIP embedding as input.
Isotropic3D allows the optimization to be isotropic w.r.t. the azimuth angle by solely resting on the SDS loss.
arXiv Detail & Related papers (2024-03-15T15:27:58Z) - 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors.
The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z) - GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts.
3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain.
This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z) - Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D
Generation [39.50894560861625]
3DFuse is a novel framework that incorporates 3D awareness into pretrained 2D diffusion models.
We introduce a training strategy that enables the 2D diffusion model learns to handle the errors and sparsity within the coarse 3D structure for robust generation.
arXiv Detail & Related papers (2023-03-14T14:24:31Z) - A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware
Image Synthesis [163.96778522283967]
We propose a shading-guided generative implicit model that is able to learn a starkly improved shape representation.
An accurate 3D shape should also yield a realistic rendering under different lighting conditions.
Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis.
arXiv Detail & Related papers (2021-10-29T10:53:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.