Related papers: Improved Distribution Matching Distillation for Fast Image Synthesis

Improved Distribution Matching Distillation for Fast Image Synthesis

URL: http://arxiv.org/abs/2405.14867v2
Date: Fri, 24 May 2024 17:08:32 GMT
Title: Improved Distribution Matching Distillation for Fast Image Synthesis
Authors: Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman,
Abstract summary: We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
Score: 54.72356560597428
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss computed using a large set of noise-image pairs generated by the teacher with many steps of a deterministic sampler. This is costly for large-scale text-to-image synthesis and limits the student's quality, tying it too closely to the teacher's original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. We show that the resulting instability is due to the fake critic not estimating the distribution of generated samples accurately and propose a two time-scale update rule as a remedy. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images. This lets us train the student model on real data, mitigating the imperfect real score estimation from the teacher model, and enhancing quality. Lastly, we modify the training procedure to enable multi-step sampling. We identify and address the training-inference input mismatch problem in this setting, by simulating inference-time generator samples during training time. Taken together, our improvements set new benchmarks in one-step image generation, with FID scores of 1.28 on ImageNet-64x64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher despite a 500X reduction in inference cost. Further, we show our approach can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods.

Related papers

Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis [65.77083310980896]
We propose Adrial Distribution Matching (ADM) to align latent predictions between real and fake score estimators for score distillation.<n>Our proposed method achieves superior one-step performance on SDXL compared to DMD2 while consuming less GPU time.<n>Additional experiments that apply multi-step ADM distillation on SD3-Medium, SD3.5-Large, and CogVideoX set a new benchmark towards efficient image and video synthesis.
arXiv Detail & Related papers (2025-07-24T16:45:05Z)
Learning Few-Step Diffusion Models by Trajectory Distribution Matching [18.229753357571116]
Trajectory Distribution Matching (TDM) is a unified distillation paradigm that combines the strengths of distribution and trajectory matching. We develop a sampling-steps-aware objective that decouples learning targets across different steps, enabling more adjustable sampling. Our model, TDM, outperforms existing methods on various backbones, delivering superior quality and significantly reduced training costs.
arXiv Detail & Related papers (2025-03-09T15:53:49Z)
One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation. We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z)
Diffusion Models Are Innate One-Step Generators [2.3359837623080613]
Diffusion Models (DMs) can generate remarkable high-quality results. DMs' layers are differentially activated at different time steps, leading to an inherent capability to generate images in a single step. Our method achieves the SOTA results on CIFAR-10, AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency.
arXiv Detail & Related papers (2024-05-31T11:14:12Z)
EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z)
Directly Denoising Diffusion Models [6.109141407163027]
We present Directly Denoising Diffusion Model (DDDM), a simple and generic approach for generating realistic images with few-step sampling. Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models. For ImageNet 64x64, our approach stands as a competitive contender against leading models.
arXiv Detail & Related papers (2024-05-22T11:20:32Z)
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation [1.5892730797514436]
Text-to-image diffusion models often suffer from slow iterative sampling processes. We present a novel image-free distillation scheme named $textbfSwiftBrush$. SwiftBrush achieves an FID score of $textbf16.67$ and a CLIP score of $textbf0.29$ on the COCO-30K benchmark.
arXiv Detail & Related papers (2023-12-08T18:44:09Z)
ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models [59.90959789767886]
We show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$times$64 and LSUN Cat 256$times$256 datasets.
arXiv Detail & Related papers (2023-11-23T16:49:06Z)
SinSR: Diffusion-Based Image Super-Resolution in a Single Step [119.18813219518042]
Super-resolution (SR) methods based on diffusion models exhibit promising results. But their practical application is hindered by the substantial number of required inference steps. We propose a simple yet effective method for achieving single-step SR generation, named SinSR.
arXiv Detail & Related papers (2023-11-23T16:21:29Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.