ATT3D: Amortized Text-to-3D Object Synthesis
- URL: http://arxiv.org/abs/2306.07349v1
- Date: Tue, 6 Jun 2023 17:59:10 GMT
- Title: ATT3D: Amortized Text-to-3D Object Synthesis
- Authors: Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki
Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James
Lucas
- Abstract summary: We amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately.
Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooths between text for novel assets and simple animations.
- Score: 78.96673650638365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-3D modelling has seen exciting progress by combining generative
text-to-image models with image-to-3D methods like Neural Radiance Fields.
DreamFusion recently achieved high-quality results but requires a lengthy,
per-prompt optimization to create 3D objects. To address this, we amortize
optimization over text prompts by training on many prompts simultaneously with
a unified model, instead of separately. With this, we share computation across
a prompt set, training in less time than per-prompt optimization. Our framework
- Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to
generalize to unseen setups and smooth interpolations between text for novel
assets and simple animations.
Related papers
- Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [39.03289977892935]
RealmDreamer is a technique for generation of general forward-facing 3D scenes from text descriptions.
Our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles.
arXiv Detail & Related papers (2024-04-10T17:57:41Z) - LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis [76.43669909525488]
LATTE3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization.
We introduce LATTE3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set.
arXiv Detail & Related papers (2024-03-22T17:59:37Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding [15.341857735842954]
Existing text-to-3D methods tend to have mode collapses, and hence poor diversity in their results.
We propose a new method that considers the joint generation of different 3D models from the same text prompt.
We show that our method leads to improved diversity in text-to-3D synthesis qualitatively and quantitatively.
arXiv Detail & Related papers (2023-12-02T08:21:20Z) - GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs.
By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model.
We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z) - 4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling [91.99172731031206]
Current text-to-4D methods face a three-way tradeoff between quality of scene appearance, 3D structure, and motion.
We introduce hybrid score distillation sampling, an alternating optimization procedure that blends supervision signals from multiple pre-trained diffusion models.
arXiv Detail & Related papers (2023-11-29T18:58:05Z) - DreamBooth3D: Subject-Driven Text-to-3D Generation [43.14506066034495]
We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject.
We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject.
Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.
arXiv Detail & Related papers (2023-03-23T17:59:00Z) - Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
Text-to-Image Diffusion Models [44.34479731617561]
We introduce explicit 3D shape priors into the CLIP-guided 3D optimization process.
We present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model.
Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy.
arXiv Detail & Related papers (2022-12-28T18:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.