DreamBooth3D: Subject-Driven Text-to-3D Generation
- URL: http://arxiv.org/abs/2303.13508v2
- Date: Mon, 27 Mar 2023 15:34:19 GMT
- Title: DreamBooth3D: Subject-Driven Text-to-3D Generation
- Authors: Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz,
Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan
Barron, Yuanzhen Li, Varun Jampani
- Abstract summary: We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject.
We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject.
Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.
- Score: 43.14506066034495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present DreamBooth3D, an approach to personalize text-to-3D generative
models from as few as 3-6 casually captured images of a subject. Our approach
combines recent advances in personalizing text-to-image models (DreamBooth)
with text-to-3D generation (DreamFusion). We find that naively combining these
methods fails to yield satisfactory subject-specific 3D assets due to
personalized text-to-image models overfitting to the input viewpoints of the
subject. We overcome this through a 3-stage optimization strategy where we
jointly leverage the 3D consistency of neural radiance fields together with the
personalization capability of text-to-image models. Our method can produce
high-quality, subject-specific 3D assets with text-driven modifications such as
novel poses, colors and attributes that are not seen in any of the input images
of the subject.
Related papers
- Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation [12.693847842218604]
We introduce a novel 3D customization method, dubbed Make-Your-3D, that can personalize high-fidelity and consistent 3D content within 5 minutes.
Our key insight is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject.
Our method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in subject image.
arXiv Detail & Related papers (2024-03-14T17:57:04Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding [15.341857735842954]
Existing text-to-3D methods tend to have mode collapses, and hence poor diversity in their results.
We propose a new method that considers the joint generation of different 3D models from the same text prompt.
We show that our method leads to improved diversity in text-to-3D synthesis qualitatively and quantitatively.
arXiv Detail & Related papers (2023-12-02T08:21:20Z) - Control3D: Towards Controllable Text-to-3D Generation [107.81136630589263]
We present a text-to-3D generation conditioning on the additional hand-drawn sketch, namely Control3D.
A 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF.
We exploit a pre-trained differentiable photo-to-sketch model to directly estimate the sketch of the rendered image over synthetic 3D scene.
arXiv Detail & Related papers (2023-11-09T15:50:32Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - ATT3D: Amortized Text-to-3D Object Synthesis [78.96673650638365]
We amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately.
Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooths between text for novel assets and simple animations.
arXiv Detail & Related papers (2023-06-06T17:59:10Z) - Fantasia3D: Disentangling Geometry and Appearance for High-quality
Text-to-3D Content Creation [45.69270771487455]
We propose a new method of Fantasia3D for high-quality text-to-3D content creation.
Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance.
Our framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets.
arXiv Detail & Related papers (2023-03-24T09:30:09Z) - 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation [107.46972849241168]
3D-TOGO model generates 3D objects in the form of the neural radiance field with good texture.
Experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects.
arXiv Detail & Related papers (2022-12-02T11:31:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.