Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation
- URL: http://arxiv.org/abs/2507.09748v1
- Date: Sun, 13 Jul 2025 18:57:45 GMT
- Title: Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation
- Authors: Yu Lei, Bingde Liu, Qingsong Xie, Haonan Lu, Zhijie Deng,
- Abstract summary: We propose a linearized variant of the model for score distillation, giving rise to the Linearized Lookahead Variational Score Distillation ($L2$-VSD)<n>$L2$-VSD can be realized efficiently with forward-mode autodiff functionalities of existing deep learning libraries.<n>We also show that our method can be seamlessly incorporated into any other VSD-based text-to-3D framework.
- Score: 10.863222482923605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-3D generation based on score distillation of pre-trained 2D diffusion models has gained increasing interest, with variational score distillation (VSD) as a remarkable example. VSD proves that vanilla score distillation can be improved by introducing an extra score-based model, which characterizes the distribution of images rendered from 3D models, to correct the distillation gradient. Despite the theoretical foundations, VSD, in practice, is likely to suffer from slow and sometimes ill-posed convergence. In this paper, we perform an in-depth investigation of the interplay between the introduced score model and the 3D model, and find that there exists a mismatching problem between LoRA and 3D distributions in practical implementation. We can simply adjust their optimization order to improve the generation quality. By doing so, the score model looks ahead to the current 3D state and hence yields more reasonable corrections. Nevertheless, naive lookahead VSD may suffer from unstable training in practice due to the potential over-fitting. To address this, we propose to use a linearized variant of the model for score distillation, giving rise to the Linearized Lookahead Variational Score Distillation ($L^2$-VSD). $L^2$-VSD can be realized efficiently with forward-mode autodiff functionalities of existing deep learning libraries. Extensive experiments validate the efficacy of $L^2$-VSD, revealing its clear superiority over prior score distillation-based methods. We also show that our method can be seamlessly incorporated into any other VSD-based text-to-3D framework.
Related papers
- Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching [14.267619174518106]
We introduce Dive3D, a novel text-to-3D generation framework that replaces KL-based objectives with Score Implicit Matching (SIM) loss.<n>We validate Dive3D across various 2D-to-3D prompts and find that it consistently outperforms prior methods in qualitative assessments.<n> Dive3D also achieves strong results on quantitative metrics, including text-asset alignment, 3D plausibility, text-geometry consistency, texture quality, and geometric detail.
arXiv Detail & Related papers (2025-06-16T15:21:30Z) - Diverse Score Distillation [27.790458964072823]
We propose a score formulation that guides the optimization to follow generation paths defined by random initial seeds.<n>We showcase the applications of our Diverse Score Distillation' (DSD) formulation across tasks such as 2D optimization, text-based 3D inference, and single-view reconstruction.
arXiv Detail & Related papers (2024-12-09T18:59:02Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping [20.7584503748821]
Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance.
We conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images.
We introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation.
arXiv Detail & Related papers (2024-09-08T14:04:48Z) - FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow [17.919092916953183]
We propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence.
Key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise.
We introduce a novel Unique Matching Couple (UCM) loss, which guides the 3D model to optimize along the same trajectory.
arXiv Detail & Related papers (2024-08-09T11:40:20Z) - Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior [87.55592645191122]
Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet.
We propose a novel and effective "Consistent3D" method that explores the ODE deterministic sampling prior for text-to-3D generation.
Experimental results show the efficacy of our Consistent3D in generating high-fidelity and diverse 3D objects and large-scale scenes.
arXiv Detail & Related papers (2024-01-17T08:32:07Z) - Taming Mode Collapse in Score Distillation for Text-to-3D Generation [70.32101198891465]
"Janus" artifact is a problem in text-to-3D generation where the generated objects fake each view with multiple front faces.
We propose a new update rule for 3D score distillation, dubbed Entropic Score Distillation ( ESD)
Although embarrassingly straightforward, our experiments successfully demonstrate that ESD can be an effective treatment for Janus artifacts in score distillation.
arXiv Detail & Related papers (2023-12-31T22:47:06Z) - Text-to-3D with Classifier Score Distillation [80.14832887529259]
Classifier-free guidance is considered an auxiliary trick rather than the most essential.
We name this method Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation.
We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing.
arXiv Detail & Related papers (2023-10-30T10:25:40Z) - Three Pillars improving Vision Foundation Model Distillation for Lidar [61.56521056618988]
We study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbones, and the pretraining dataset.
Thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality.
arXiv Detail & Related papers (2023-10-26T15:54:43Z) - Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D
Generation [39.50894560861625]
3DFuse is a novel framework that incorporates 3D awareness into pretrained 2D diffusion models.
We introduce a training strategy that enables the 2D diffusion model learns to handle the errors and sparsity within the coarse 3D structure for robust generation.
arXiv Detail & Related papers (2023-03-14T14:24:31Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.