DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model
- URL: http://arxiv.org/abs/2408.02993v2
- Date: Fri, 9 Aug 2024 14:12:49 GMT
- Title: DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model
- Authors: Yiming Zhong, Xiaolin Zhang, Yao Zhao, Yunchao Wei,
- Abstract summary: We propose DreamLCM which incorporates the Latent Consistency Model (LCM) to generate consistent and high-quality guidance.
The proposed method can provide accurate and detailed gradients to optimize the target 3D models.
DreamLCM achieves state-of-the-art results in both generation quality and training efficiency.
- Score: 77.84225358245487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In this paper, to address the issue, we propose DreamLCM which incorporates the Latent Consistency Model (LCM). DreamLCM leverages the powerful image generation capabilities inherent in LCM, enabling generating consistent and high-quality guidance, i.e., predicted noises or images. Powered by the improved guidance, the proposed method can provide accurate and detailed gradients to optimize the target 3D models. In addition, we propose two strategies to enhance the generation quality further. Firstly, we propose a guidance calibration strategy, utilizing Euler Solver to calibrate the guidance distribution to accelerate 3D models to converge. Secondly, we propose a dual timestep strategy, increasing the consistency of guidance and optimizing 3D models from geometry to appearance in DreamLCM. Experiments show that DreamLCM achieves state-of-the-art results in both generation quality and training efficiency. The code is available at https://github.com/1YimingZhong/DreamLCM.
Related papers
- PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks.
PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps.
For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z) - Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models [29.818123424954294]
Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling.
We propose a Latent Consistency Model (LCM) adaptation for the task at hand.
We analyze the strengths and weaknesses of the proposed model and evaluate it quantitatively and qualitatively.
arXiv Detail & Related papers (2024-06-17T04:40:07Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion [88.02512124661884]
We propose Magic-Boost, a multi-view conditioned diffusion model that significantly refines coarse generative results.
Compared to the previous text or single image based diffusion models, Magic-Boost exhibits a robust capability to generate images with high consistency.
It provides precise SDS guidance that well aligns with the identity of the input images, enriching the local detail in both geometry and texture of the initial generative results.
arXiv Detail & Related papers (2024-04-09T16:20:03Z) - Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.
We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z) - BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion [0.0]
BoostDream is a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality.
We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation.
A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets.
arXiv Detail & Related papers (2024-01-30T05:59:00Z) - DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior [97.694840981611]
We propose a two-stage 2D-lifting framework, namely DreamControl.
It generates fine-grained objects with control-based score distillation.
DreamControl can generate high-quality 3D content in terms of both geometry consistency and texture fidelity.
arXiv Detail & Related papers (2023-12-11T15:12:50Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.