SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
- URL: http://arxiv.org/abs/2506.00523v1
- Date: Sat, 31 May 2025 11:59:02 GMT
- Title: SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
- Authors: Xingtong Ge, Xin Zhang, Tongda Xu, Yi Zhang, Xinjie Zhang, Yan Wang, Jun Zhang,
- Abstract summary: The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5.<n>However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX.
- Score: 12.842428916585217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to regularize the distance between the generator and fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep importance distribution from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Along with other improvements such as scaled up discriminator models, our final model, dubbed \textbf{SenseFlow}, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX. The source code will be avaliable at https://github.com/XingtongGe/SenseFlow.
Related papers
- Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis [65.77083310980896]
We propose Adrial Distribution Matching (ADM) to align latent predictions between real and fake score estimators for score distillation.<n>Our proposed method achieves superior one-step performance on SDXL compared to DMD2 while consuming less GPU time.<n>Additional experiments that apply multi-step ADM distillation on SD3-Medium, SD3.5-Large, and CogVideoX set a new benchmark towards efficient image and video synthesis.
arXiv Detail & Related papers (2025-07-24T16:45:05Z) - Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning [61.04787144322498]
Diffusion-based models, widely used in text-to-image generation, have proven effective in 2D representation learning.<n>We propose PointSD, a framework that leverages the SD model for 3D self-supervised learning.
arXiv Detail & Related papers (2025-07-12T01:20:07Z) - LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation [1.1847464266302488]
Flow Matching (FM) is a powerful generative modeling paradigm based on a simulation-free training objective instead of a score-based one used in DMs.<n>We present Learned Distribution-guided Flow Matching (LeDiFlow), a novel scalable method for training FM-based image generation models.<n>Our method utilizes a State-Of-The-Art (SOTA) transformer architecture combined with latent space sampling and can be trained on a consumer workstation.
arXiv Detail & Related papers (2025-05-27T05:07:37Z) - ProReflow: Progressive Reflow with Decomposed Velocity [52.249464542399636]
Flow matching aims to reflow the diffusion process of diffusion models into a straight line for a few-step and even one-step generation.<n>We introduce progressive reflow, which progressively reflows the diffusion models in local timesteps until the whole diffusion progresses.<n>We also introduce aligned v-prediction, which highlights the importance of direction matching in flow matching over magnitude matching.
arXiv Detail & Related papers (2025-03-05T04:50:53Z) - One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models.<n>First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR.<n>Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z) - Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow [65.51671121528858]
Diffusion models have greatly improved visual generation but are hindered by slow generation speed due to the computationally intensive nature of solving generative ODEs.
Rectified flow, a widely recognized solution, improves generation speed by straightening the ODE path.
We propose Rectified Diffusion, which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models.
arXiv Detail & Related papers (2024-10-09T17:43:38Z) - Unleashing the Power of One-Step Diffusion based Image Super-Resolution via a Large-Scale Diffusion Discriminator [81.81748032199813]
Diffusion models have demonstrated excellent performance for real-world image super-resolution (Real-ISR)<n>We propose a new One-Step textbfDiffusion model with a larger-scale textbfDiscriminator for SR.<n>Our discriminator is able to distill noisy features from any time step of diffusion models in the latent space.
arXiv Detail & Related papers (2024-10-05T16:41:36Z) - FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow [17.919092916953183]
We propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence.
Key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise.
We introduce a novel Unique Matching Couple (UCM) loss, which guides the 3D model to optimize along the same trajectory.
arXiv Detail & Related papers (2024-08-09T11:40:20Z) - PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher [55.22994720855957]
PaGoDA is a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution.
With the proposed pipeline, PaGoDA achieves a $64times$ reduced cost in training its diffusion model on 8x downsampled data.
PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models.
arXiv Detail & Related papers (2024-05-23T17:39:09Z) - SDXL: Improving Latent Diffusion Models for High-Resolution Image
Synthesis [8.648456572970035]
We present SDXL, a latent diffusion model for text-to-image synthesis.
Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone.
We show drastically improved performance compared the previous versions of Stable Diffusion.
arXiv Detail & Related papers (2023-07-04T23:04:57Z) - The Score-Difference Flow for Implicit Generative Modeling [1.1929584800629673]
Implicit generative modeling aims to produce samples of synthetic data matching a target data distribution.<n>Recent work has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution.<n>We present the score difference between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them.
arXiv Detail & Related papers (2023-04-25T15:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.