Related papers: Text-to-3D with Classifier Score Distillation

Text-to-3D with Classifier Score Distillation

URL: http://arxiv.org/abs/2310.19415v2
Date: Tue, 31 Oct 2023 05:44:44 GMT
Title: Text-to-3D with Classifier Score Distillation
Authors: Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, Xiaojuan Qi
Abstract summary: Classifier-free guidance is considered an auxiliary trick rather than the most essential. We name this method Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing.
Score: 80.14832887529259
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation

Related papers

Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation [10.863222482923605]
We propose a linearized variant of the model for score distillation, giving rise to the Linearized Lookahead Variational Score Distillation ($L2$-VSD)<n>$L2$-VSD can be realized efficiently with forward-mode autodiff functionalities of existing deep learning libraries.<n>We also show that our method can be seamlessly incorporated into any other VSD-based text-to-3D framework.
arXiv Detail & Related papers (2025-07-13T18:57:45Z)
Rethinking Score Distilling Sampling for 3D Editing and Generation [50.52808917055502]
Unified Distillation Sampling (UDS) is a method that seamlessly integrates the generation and editing of 3D assets.<n>UDS not only outperforms baseline methods in generating 3D assets with richer details but also excels in editing tasks, thereby bridging the gap between 3D generation and editing.
arXiv Detail & Related papers (2025-05-03T18:40:39Z)
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation [28.88237230872795]
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. We introduce a novel SDS approach, designed to improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-11T17:26:00Z)
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation [67.36775428466045]
We propose Geometry Guided Self-Distillation (GGSD) to learn superior 3D representations from 2D pre-trained models. Due to the advantages of 3D representation, the performance of the distilled 3D student model can significantly surpass that of the 2D teacher model.
arXiv Detail & Related papers (2024-07-18T10:13:56Z)
VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z)
3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation [40.49322398635262]
We propose the first method to tackle 3D open-vocabulary panoptic segmentation. Our model takes advantage of the fusion between learnable LiDAR features and dense frozen vision CLIP features. We propose two novel loss functions: object-level distillation loss and voxel-level distillation loss.
arXiv Detail & Related papers (2024-01-04T18:39:32Z)
Taming Mode Collapse in Score Distillation for Text-to-3D Generation [70.32101198891465]
"Janus" artifact is a problem in text-to-3D generation where the generated objects fake each view with multiple front faces. We propose a new update rule for 3D score distillation, dubbed Entropic Score Distillation ( ESD) Although embarrassingly straightforward, our experiments successfully demonstrate that ESD can be an effective treatment for Janus artifacts in score distillation.
arXiv Detail & Related papers (2023-12-31T22:47:06Z)
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation [15.154441074606101]
Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. DDPO3D employs the policy gradient method in tandem with aesthetic scoring to improve 3D rendering from 2D diffusion models. Our approach is compatible with score distillation-based methods, which would facilitate the integration of diverse reward functions into the generative process.
arXiv Detail & Related papers (2023-12-08T02:41:04Z)
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching [33.696757740830506]
Recent advancements in text-to-3D generation have shown promise. Many methods base themselves on Score Distillation Sampling (SDS) We propose Interval Score Matching (ISM) to counteract over-smoothing.
arXiv Detail & Related papers (2023-11-19T09:59:09Z)
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding [58.924180772480504]
3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query. We propose to leverage weakly supervised annotations to learn the 3D visual grounding model. We design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-07-18T13:49:49Z)
Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors [79.80916315953374]
We propose SSP3D, a semi-supervised framework for 3D reconstruction. We introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction. Our approach also performs well when transferring to real-world Pix3D datasets under labeling ratios of 10%.
arXiv Detail & Related papers (2022-09-30T11:19:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.