Taming Mode Collapse in Score Distillation for Text-to-3D Generation
- URL: http://arxiv.org/abs/2401.00909v2
- Date: Fri, 29 Mar 2024 18:04:37 GMT
- Title: Taming Mode Collapse in Score Distillation for Text-to-3D Generation
- Authors: Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra,
- Abstract summary: "Janus" artifact is a problem in text-to-3D generation where the generated objects fake each view with multiple front faces.
We propose a new update rule for 3D score distillation, dubbed Entropic Score Distillation ( ESD)
Although embarrassingly straightforward, our experiments successfully demonstrate that ESD can be an effective treatment for Janus artifacts in score distillation.
- Score: 70.32101198891465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the remarkable performance of score distillation in text-to-3D generation, such techniques notoriously suffer from view inconsistency issues, also known as "Janus" artifact, where the generated objects fake each view with multiple front faces. Although empirically effective methods have approached this problem via score debiasing or prompt engineering, a more rigorous perspective to explain and tackle this problem remains elusive. In this paper, we reveal that the existing score distillation-based text-to-3D generation frameworks degenerate to maximal likelihood seeking on each view independently and thus suffer from the mode collapse problem, manifesting as the Janus artifact in practice. To tame mode collapse, we improve score distillation by re-establishing the entropy term in the corresponding variational objective, which is applied to the distribution of rendered images. Maximizing the entropy encourages diversity among different views in generated 3D assets, thereby mitigating the Janus problem. Based on this new objective, we derive a new update rule for 3D score distillation, dubbed Entropic Score Distillation (ESD). We theoretically reveal that ESD can be simplified and implemented by just adopting the classifier-free guidance trick upon variational score distillation. Although embarrassingly straightforward, our extensive experiments successfully demonstrate that ESD can be an effective treatment for Janus artifacts in score distillation.
Related papers
- Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation [32.52588154649761]
We analyze current score distillation methods by connecting theories of consistency distillation to score distillation.
We propose an optimization framework, Guided Consistency Sampling (GCS), integrated with 3D Gaussian Splatting (3DGS) to alleviate those issues.
We introduce a Brightness-Equalized Generation (BEG) scheme in 3DGS rendering to mitigate this issue.
arXiv Detail & Related papers (2024-07-18T15:25:41Z) - VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation [33.05759961083337]
This paper presents Invariant Score Distillation (ISD), a novel method for high-fidelity text-to-3D generation.
ISD aims to tackle the over-saturation and over-smoothing problems in Score Distillation Sampling (SDS)
arXiv Detail & Related papers (2024-07-13T09:33:16Z) - VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks.
PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps.
For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z) - EG4D: Explicit Generation of 4D Object without Score Distillation [105.63506584772331]
DG4D is a novel framework that generates high-quality and consistent 4D assets without score distillation.
Our framework outperforms the baselines in generation quality by a considerable margin.
arXiv Detail & Related papers (2024-05-28T12:47:22Z) - SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity [70.32101198891465]
We show that gradient estimation in score distillation is inherent to high variance.
We propose a more general solution to reduce variance for score distillation, termed Stein Score Distillation (SSD)
We demonstrate that SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates.
arXiv Detail & Related papers (2023-12-31T23:04:25Z) - Text-to-3D with Classifier Score Distillation [80.14832887529259]
Classifier-free guidance is considered an auxiliary trick rather than the most essential.
We name this method Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation.
We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing.
arXiv Detail & Related papers (2023-10-30T10:25:40Z) - Three Pillars improving Vision Foundation Model Distillation for Lidar [61.56521056618988]
We study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbones, and the pretraining dataset.
Thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality.
arXiv Detail & Related papers (2023-10-26T15:54:43Z) - Debiasing Scores and Prompts of 2D Diffusion for View-consistent
Text-to-3D Generation [38.032010026146146]
We propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation.
One of the most notable issues is the Janus problem, where the most canonical view of an object appears in other views.
Our methods improve the realism of the generated 3D objects by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead.
arXiv Detail & Related papers (2023-03-27T17:31:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.