Related papers: Walking the Schrödinger Bridge: A Direct Trajectory for Text-to-3D Generation

Walking the Schrödinger Bridge: A Direct Trajectory for Text-to-3D Generation

URL: http://arxiv.org/abs/2511.05609v1
Date: Thu, 06 Nov 2025 09:21:57 GMT
Title: Walking the Schrödinger Bridge: A Direct Trajectory for Text-to-3D Generation
Authors: Ziying Li, Xuequan Lu, Xinkui Zhao, Guanjie Cheng, Shuiguang Deng, Jianwei Yin,
Abstract summary: We introduce Tray-Centric Distillation (TraCe), a novel text-to-3D generation framework.<n>TraCe consistently achieves superior quality and fidelity to state-of-the-art techniques.
Score: 51.337622918786074
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in optimization-based text-to-3D generation heavily rely on distilling knowledge from pre-trained text-to-image diffusion models using techniques like Score Distillation Sampling (SDS), which often introduce artifacts such as over-saturation and over-smoothing into the generated 3D assets. In this paper, we address this essential problem by formulating the generation process as learning an optimal, direct transport trajectory between the distribution of the current rendering and the desired target distribution, thereby enabling high-quality generation with smaller Classifier-free Guidance (CFG) values. At first, we theoretically establish SDS as a simplified instance of the Schr\"odinger Bridge framework. We prove that SDS employs the reverse process of an Schr\"odinger Bridge, which, under specific conditions (e.g., a Gaussian noise as one end), collapses to SDS's score function of the pre-trained diffusion model. Based upon this, we introduce Trajectory-Centric Distillation (TraCe), a novel text-to-3D generation framework, which reformulates the mathematically trackable framework of Schr\"odinger Bridge to explicitly construct a diffusion bridge from the current rendering to its text-conditioned, denoised target, and trains a LoRA-adapted model on this trajectory's score dynamics for robust 3D optimization. Comprehensive experiments demonstrate that TraCe consistently achieves superior quality and fidelity to state-of-the-art techniques.

Related papers

Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference [69.34278282513593]
Preference Score Distillation (PSD) is an optimization-based framework for human-aligned text-to-3D synthesis without 3D training data.<n>Our key insight stems from the incompatibility of pixel-level gradients.<n>We introduce an adaptive strategy to co-optimize preference scores and negative text embeddings.
arXiv Detail & Related papers (2026-03-02T08:23:36Z)
AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation [56.399153019429605]
This work shows that ignoring source dynamics yields inconsistent trajectories that suppress or merge semantic cues.<n>We reformulate text-to-3D optimization as mapping a dynamically evolving source distribution to a fixed target distribution.<n>We introduce AnchorDS, an improved score distillation mechanism that provides state-anchored guidance with image conditions.
arXiv Detail & Related papers (2025-11-12T09:51:23Z)
Score Distillation of Flow Matching Models [67.86066177182046]
We extend Score identity Distillation (SiD) to pretrained text-to-image flow-matching models.<n>SiD works out of the box across these models, in both data-free and data-aided settings.<n>This provides the first systematic evidence that score distillation applies broadly to text-to-image flow matching models.
arXiv Detail & Related papers (2025-09-29T17:45:48Z)
Consistency Diffusion Models for Single-Image 3D Reconstruction with Priors [24.086775858948755]
We introduce a pioneering training framework under diffusion models.<n>We convert 3D structural priors derived from the initial 3D point cloud as a bound term.<n>We extract and incorporate 2D priors from the single input image, projecting them onto the 3D point cloud to enrich the guidance for diffusion training.
arXiv Detail & Related papers (2025-01-28T06:21:57Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping [20.7584503748821]
Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. We conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. We introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation.
arXiv Detail & Related papers (2024-09-08T14:04:48Z)
FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow [17.919092916953183]
We propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence. Key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise. We introduce a novel Unique Matching Couple (UCM) loss, which guides the 3D model to optimize along the same trajectory.
arXiv Detail & Related papers (2024-08-09T11:40:20Z)
CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models. Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z)
Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z)
Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance. Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets. In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.