Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2212.14704v2
- Date: Mon, 3 Apr 2023 15:55:40 GMT
- Title: Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
Text-to-Image Diffusion Models
- Authors: Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu
Qie, Shenghua Gao
- Abstract summary: We introduce explicit 3D shape priors into the CLIP-guided 3D optimization process.
We present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model.
Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy.
- Score: 44.34479731617561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent CLIP-guided 3D optimization methods, such as DreamFields and
PureCLIPNeRF, have achieved impressive results in zero-shot text-to-3D
synthesis. However, due to scratch training and random initialization without
prior knowledge, these methods often fail to generate accurate and faithful 3D
structures that conform to the input text. In this paper, we make the first
attempt to introduce explicit 3D shape priors into the CLIP-guided 3D
optimization process. Specifically, we first generate a high-quality 3D shape
from the input text in the text-to-shape stage as a 3D shape prior. We then use
it as the initialization of a neural radiance field and optimize it with the
full prompt. To address the challenging text-to-shape generation task, we
present a simple yet effective approach that directly bridges the text and
image modalities with a powerful text-to-image diffusion model. To narrow the
style domain gap between the images synthesized by the text-to-image diffusion
model and shape renderings used to train the image-to-shape generator, we
further propose to jointly optimize a learnable text prompt and fine-tune the
text-to-image diffusion model for rendering-style image generation. Our method,
Dream3D, is capable of generating imaginative 3D content with superior visual
quality and shape accuracy compared to state-of-the-art methods.
Related papers
- RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [39.03289977892935]
RealmDreamer is a technique for generation of general forward-facing 3D scenes from text descriptions.
Our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles.
arXiv Detail & Related papers (2024-04-10T17:57:41Z) - 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors.
The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z) - PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion [18.82883336156591]
We present PI3D, a framework that fully leverages the pre-trained text-to-image diffusion models' ability to generate high-quality 3D shapes from text prompts in minutes.
PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin.
arXiv Detail & Related papers (2023-12-14T16:04:34Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - ATT3D: Amortized Text-to-3D Object Synthesis [78.96673650638365]
We amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately.
Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooths between text for novel assets and simple animations.
arXiv Detail & Related papers (2023-06-06T17:59:10Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - Text to Mesh Without 3D Supervision Using Limit Subdivision [13.358081015190255]
We present a technique for zero-shot generation of a 3D model using only a target text prompt.
We rely on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model.
arXiv Detail & Related papers (2022-03-24T20:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.