Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals
- URL: http://arxiv.org/abs/2510.27684v1
- Date: Fri, 31 Oct 2025 17:55:10 GMT
- Title: Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals
- Authors: Xiangyu Fan, Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang, Zhiqian Lin, Tianxiang Ren, Dahua Lin, Ruihao Gong, Lei Yang,
- Abstract summary: Phased DMD is a multi-step distillation framework that bridges the idea of phase-wise distillation with Mixture-of-Experts.<n>Phased DMD is built upon two key ideas: progressive distribution matching and score matching within subintervals.<n> Experimental results demonstrate that Phased DMD preserves output diversity better than DMD while retaining key generative capabilities.
- Score: 48.14879329270912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distribution Matching Distillation (DMD) distills score-based generative models into efficient one-step generators, without requiring a one-to-one correspondence with the sampling trajectories of their teachers. However, limited model capacity causes one-step distilled models underperform on complex generative tasks, e.g., synthesizing intricate object motions in text-to-video generation. Directly extending DMD to multi-step distillation increases memory usage and computational depth, leading to instability and reduced efficiency. While prior works propose stochastic gradient truncation as a potential solution, we observe that it substantially reduces the generation diversity of multi-step distilled models, bringing it down to the level of their one-step counterparts. To address these limitations, we propose Phased DMD, a multi-step distillation framework that bridges the idea of phase-wise distillation with Mixture-of-Experts (MoE), reducing learning difficulty while enhancing model capacity. Phased DMD is built upon two key ideas: progressive distribution matching and score matching within subintervals. First, our model divides the SNR range into subintervals, progressively refining the model to higher SNR levels, to better capture complex distributions. Next, to ensure the training objective within each subinterval is accurate, we have conducted rigorous mathematical derivations. We validate Phased DMD by distilling state-of-the-art image and video generation models, including Qwen-Image (20B parameters) and Wan2.2 (28B parameters). Experimental results demonstrate that Phased DMD preserves output diversity better than DMD while retaining key generative capabilities. We will release our code and models.
Related papers
- Transition Matching Distillation for Fast Video Generation [63.1049790376783]
We present Transition Matching Distillation (TMD), a novel framework for distilling video diffusion models into efficient few-step generators.<n>TMD matches the multi-step denoising trajectory of a diffusion model with a few-step probability transition process.<n>TMD provides a flexible and strong trade-off between generation speed and visual quality.
arXiv Detail & Related papers (2026-01-14T21:30:03Z) - Distribution Matching Distillation Meets Reinforcement Learning [30.960105413888943]
We propose DMDR, a novel framework that combines Reinforcement Learning (RL) techniques into the distillation process.<n>We show that for the RL of the few-step generator, the DMD loss itself is a more effective regularization compared to the traditional ones.<n>Experiments demonstrate that DMDR can achieve leading visual quality, prompt coherence among few-step methods, and even exhibit performance that exceeds the multi-step teacher.
arXiv Detail & Related papers (2025-11-17T17:59:54Z) - Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis [65.77083310980896]
We propose Adrial Distribution Matching (ADM) to align latent predictions between real and fake score estimators for score distillation.<n>Our proposed method achieves superior one-step performance on SDXL compared to DMD2 while consuming less GPU time.<n>Additional experiments that apply multi-step ADM distillation on SD3-Medium, SD3.5-Large, and CogVideoX set a new benchmark towards efficient image and video synthesis.
arXiv Detail & Related papers (2025-07-24T16:45:05Z) - EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality.<n>We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z) - Diffusion Bridge Implicit Models [25.213664260896103]
Denoising diffusion bridge models (DDBMs) are a powerful variant of diffusion models for interpolating between two arbitrary paired distributions.<n>We take the first step in fast sampling of DDBMs without extra training, motivated by the well-established recipes in diffusion models.<n>We induce a novel, simple, and insightful form of ordinary differential equation (ODE) which inspires high-order numerical solvers.
arXiv Detail & Related papers (2024-05-24T19:08:30Z) - Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training.
First, we eliminate the regression loss and the need for expensive dataset construction.
Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis [20.2271205957037]
Hyper-SD is a novel framework that amalgamates the advantages of ODE Trajectory Preservation and Reformulation.
We introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments.
We incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process.
arXiv Detail & Related papers (2024-04-21T15:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.