Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
- URL: http://arxiv.org/abs/2408.15991v2
- Date: Wed, 25 Sep 2024 03:05:05 GMT
- Title: Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
- Authors: Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun,
- Abstract summary: We propose Distribution Backtracking Distillation (DisBack) to speed up the sampling speed of diffusion models.
DisBack achieves faster and better convergence than the existing distillation method, with FID score of 1.38 on ImageNet 64x64 dataset.
- Score: 19.88187051373436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into a student generator to achieve one-step generation, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process, because existing methods mainly focus on using the endpoint of pre-trained diffusion models as teacher models, overlooking the importance of the convergence trajectory between the student generator and the teacher model. To address this issue, we extend the score distillation process by introducing the entire convergence trajectory of teacher models and propose Distribution Backtracking Distillation (DisBack). DisBask is composed of two stages: Degradation Recording and Distribution Backtracking. Degradation Recording is designed to obtain the convergence trajectory of the teacher model, which records the degradation path from the trained teacher model to the untrained initial student generator. The degradation path implicitly represents the teacher model's intermediate distributions, and its reverse can be viewed as the convergence trajectory from the student generator to the teacher model. Then Distribution Backtracking trains a student generator to backtrack the intermediate distributions along the path to approximate the convergence trajectory of teacher models. Extensive experiments show that DisBack achieves faster and better convergence than the existing distillation method and accomplishes comparable generation performance, with FID score of 1.38 on ImageNet 64x64 dataset. Notably, DisBack is easy to implement and can be generalized to existing distillation methods to boost performance. Our code is publicly available on https://github.com/SYZhang0805/DisBack.
Related papers
- Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation [84.38105530043741]
We propose Warmup-Distill, which aligns the distillation of the student to that of the teacher in advance of distillation.
Experiments on the seven benchmarks demonstrate that Warmup-Distill could provide a warmup student more suitable for distillation.
arXiv Detail & Related papers (2025-02-17T12:58:12Z) - Towards Training One-Step Diffusion Models Without Distillation [72.80423908458772]
We show that one-step generative models can be trained directly without this distillation process.
We propose a family of distillation methods that achieve competitive results without relying on score estimation.
arXiv Detail & Related papers (2025-02-11T23:02:14Z) - Single Trajectory Distillation for Accelerating Image and Video Style Transfer [22.304420035048942]
Diffusion-based stylization methods typically denoise from a specific partial noise state for image-to-image and video-to-video tasks.
We propose single trajectory distillation (STD) starting from a specific partial noise state.
Our method surpasses existing acceleration models in terms of style similarity and aesthetic evaluations.
arXiv Detail & Related papers (2024-12-25T16:40:23Z) - Inference-Time Diffusion Model Distillation [59.350789627086456]
We introduce Distillation++, a novel inference-time distillation framework.
Inspired by recent advances in conditional sampling, our approach recasts student model sampling as a proximal optimization problem.
We integrate distillation optimization during reverse sampling, which can be viewed as teacher guidance.
arXiv Detail & Related papers (2024-12-12T02:07:17Z) - Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training.
First, we eliminate the regression loss and the need for expensive dataset construction.
Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - ReDi: Efficient Learning-Free Diffusion Inference via Trajectory
Retrieval [68.7008281316644]
ReDi is a learning-free Retrieval-based Diffusion sampling framework.
We show that ReDi improves the model inference efficiency by 2x speedup.
arXiv Detail & Related papers (2023-02-05T03:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.