Efficient Text-Guided 3D-Aware Portrait Generation with Score
Distillation Sampling on Distribution
- URL: http://arxiv.org/abs/2306.02083v1
- Date: Sat, 3 Jun 2023 11:08:38 GMT
- Title: Efficient Text-Guided 3D-Aware Portrait Generation with Score
Distillation Sampling on Distribution
- Authors: Yiji Cheng, Fei Yin, Xiaoke Huang, Xintong Yu, Jiaxiang Liu, Shikun
Feng, Yujiu Yang, Yansong Tang
- Abstract summary: We propose DreamPortrait, which aims to generate text-guided 3D-aware portraits in a single-forward pass for efficiency.
We further design a 3D-aware gated cross-attention mechanism to explicitly let the model perceive the correspondence between the text and the 3D-aware space.
- Score: 28.526714129927093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-3D is an emerging task that allows users to create 3D content with
infinite possibilities. Existing works tackle the problem by optimizing a 3D
representation with guidance from pre-trained diffusion models. An apparent
drawback is that they need to optimize from scratch for each prompt, which is
computationally expensive and often yields poor visual fidelity. In this paper,
we propose DreamPortrait, which aims to generate text-guided 3D-aware portraits
in a single-forward pass for efficiency. To achieve this, we extend Score
Distillation Sampling from datapoint to distribution formulation, which injects
semantic prior into a 3D distribution. However, the direct extension will lead
to the mode collapse problem since the objective only pursues semantic
alignment. Hence, we propose to optimize a distribution with hierarchical
condition adapters and GAN loss regularization. For better 3D modeling, we
further design a 3D-aware gated cross-attention mechanism to explicitly let the
model perceive the correspondence between the text and the 3D-aware space.
These elaborated designs enable our model to generate portraits with robust
multi-view semantic consistency, eliminating the need for optimization-based
methods. Extensive experiments demonstrate our model's highly competitive
performance and significant speed boost against existing methods.
Related papers
- VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks.
PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps.
For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image [28.759158325097093]
Unique3D is a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images.
Our framework features state-of-the-art generation fidelity and strong generalizability.
arXiv Detail & Related papers (2024-05-30T17:59:54Z) - DreamFlow: High-Quality Text-to-3D Generation by Approximating Probability Flow [72.9209434105892]
We propose to enhance the text-to-3D optimization by leveraging the T2I diffusion prior in the generative sampling process with a predetermined timestep schedule.
By leveraging the proposed novel optimization algorithm, we design DreamFlow, a practical three-stage coarseto-fine text-to-3D optimization framework.
arXiv Detail & Related papers (2024-03-22T05:38:15Z) - Spice-E : Structural Priors in 3D Diffusion using Cross-Entity Attention [9.52027244702166]
Spice-E is a neural network that adds structural guidance to 3D diffusion models.
We show that our approach supports a variety of applications, including 3D stylization, semantic shape editing and text-conditional abstraction-to-3D.
arXiv Detail & Related papers (2023-11-29T17:36:49Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation [24.042803966469066]
We show that the conflict between the 3D optimization process and uniform timestep sampling in score distillation is the main reason for these limitations.
We propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns the 3D optimization process with the sampling process of diffusion model.
Our simple redesign significantly improves 3D content creation with faster convergence, better quality and diversity.
arXiv Detail & Related papers (2023-06-21T17:59:45Z) - DreamFusion: Text-to-3D using 2D Diffusion [52.52529213936283]
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
arXiv Detail & Related papers (2022-09-29T17:50:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.