Text-to-3D with Classifier Score Distillation
- URL: http://arxiv.org/abs/2310.19415v2
- Date: Tue, 31 Oct 2023 05:44:44 GMT
- Title: Text-to-3D with Classifier Score Distillation
- Authors: Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang,
Xiaojuan Qi
- Abstract summary: Classifier-free guidance is considered an auxiliary trick rather than the most essential.
We name this method Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation.
We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing.
- Score: 80.14832887529259
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-3D generation has made remarkable progress recently, particularly
with methods based on Score Distillation Sampling (SDS) that leverages
pre-trained 2D diffusion models. While the usage of classifier-free guidance is
well acknowledged to be crucial for successful optimization, it is considered
an auxiliary trick rather than the most essential component. In this paper, we
re-evaluate the role of classifier-free guidance in score distillation and
discover a surprising finding: the guidance alone is enough for effective
text-to-3D generation tasks. We name this method Classifier Score Distillation
(CSD), which can be interpreted as using an implicit classification model for
generation. This new perspective reveals new insights for understanding
existing techniques. We validate the effectiveness of CSD across a variety of
text-to-3D tasks including shape generation, texture synthesis, and shape
editing, achieving results superior to those of state-of-the-art methods. Our
project page is https://xinyu-andy.github.io/Classifier-Score-Distillation
Related papers
- Semantic Score Distillation Sampling for Compositional Text-to-3D Generation [28.88237230872795]
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research.
We introduce a novel SDS approach, designed to improve the expressiveness and accuracy of compositional text-to-3D generation.
Our approach integrates new semantic embeddings that maintain consistency across different rendering views.
By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-11T17:26:00Z) - Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation [67.36775428466045]
We propose Geometry Guided Self-Distillation (GGSD) to learn superior 3D representations from 2D pre-trained models.
Due to the advantages of 3D representation, the performance of the distilled 3D student model can significantly surpass that of the 2D teacher model.
arXiv Detail & Related papers (2024-07-18T10:13:56Z) - VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks.
PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps.
For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z) - 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation [40.49322398635262]
We propose the first method to tackle 3D open-vocabulary panoptic segmentation.
Our model takes advantage of the fusion between learnable LiDAR features and dense frozen vision CLIP features.
We propose two novel loss functions: object-level distillation loss and voxel-level distillation loss.
arXiv Detail & Related papers (2024-01-04T18:39:32Z) - Taming Mode Collapse in Score Distillation for Text-to-3D Generation [70.32101198891465]
"Janus" artifact is a problem in text-to-3D generation where the generated objects fake each view with multiple front faces.
We propose a new update rule for 3D score distillation, dubbed Entropic Score Distillation ( ESD)
Although embarrassingly straightforward, our experiments successfully demonstrate that ESD can be an effective treatment for Janus artifacts in score distillation.
arXiv Detail & Related papers (2023-12-31T22:47:06Z) - RL Dreams: Policy Gradient Optimization for Score Distillation based 3D
Generation [15.154441074606101]
Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent.
DDPO3D employs the policy gradient method in tandem with aesthetic scoring to improve 3D rendering from 2D diffusion models.
Our approach is compatible with score distillation-based methods, which would facilitate the integration of diverse reward functions into the generative process.
arXiv Detail & Related papers (2023-12-08T02:41:04Z) - LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval
Score Matching [33.696757740830506]
Recent advancements in text-to-3D generation have shown promise.
Many methods base themselves on Score Distillation Sampling (SDS)
We propose Interval Score Matching (ISM) to counteract over-smoothing.
arXiv Detail & Related papers (2023-11-19T09:59:09Z) - Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
Supervised 3D Visual Grounding [58.924180772480504]
3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query.
We propose to leverage weakly supervised annotations to learn the 3D visual grounding model.
We design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-07-18T13:49:49Z) - Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors [79.80916315953374]
We propose SSP3D, a semi-supervised framework for 3D reconstruction.
We introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.
Our approach also performs well when transferring to real-world Pix3D datasets under labeling ratios of 10%.
arXiv Detail & Related papers (2022-09-30T11:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.