DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
- URL: http://arxiv.org/abs/2409.03550v1
- Date: Thu, 5 Sep 2024 14:12:22 GMT
- Title: DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
- Authors: Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie,
- Abstract summary: Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas.
The most common way to accelerate DMs involves reducing the number of denoising steps during generation.
We propose a novel method that transfers the capability of large pretrained DMs to faster architectures.
- Score: 69.58440626023541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas, while they are hindered by slow inference speeds and high computational demands during deployment. The most common way to accelerate DMs involves reducing the number of denoising steps during generation, achieved through faster sampling solvers or knowledge distillation (KD). In contrast to prior approaches, we propose a novel method that transfers the capability of large pretrained DMs to faster architectures. Specifically, we employ KD in a distinct manner to compress DMs by distilling their generative ability into more rapid variants. Furthermore, considering that the source data is either unaccessible or too enormous to store for current generative models, we introduce a new paradigm for their distillation without source data, termed Data-Free Knowledge Distillation for Diffusion Models (DKDM). Generally, our established DKDM framework comprises two main components: 1) a DKDM objective that uses synthetic denoising data produced by pretrained DMs to optimize faster DMs without source data, and 2) a dynamic iterative distillation method that flexibly organizes the synthesis of denoising data, preventing it from slowing down the optimization process as the generation is slow. To our knowledge, this is the first attempt at using KD to distill DMs into any architecture in a data-free manner. Importantly, our DKDM is orthogonal to most existing acceleration methods, such as denoising step reduction, quantization and pruning. Experiments show that our DKDM is capable of deriving 2x faster DMs with performance remaining on par with the baseline. Notably, our DKDM enables pretrained DMs to function as "datasets" for training new DMs.
Related papers
- Slight Corruption in Pre-training Data Makes Better Diffusion Models [71.90034201302397]
Diffusion models (DMs) have shown remarkable capabilities in generating high-quality images, audios, and videos.
DMs benefit significantly from extensive pre-training on large-scale datasets.
However, pre-training datasets often contain corrupted pairs where conditions do not accurately describe the data.
This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs.
arXiv Detail & Related papers (2024-05-30T21:35:48Z) - BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models [39.287947829085155]
This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient.
From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized.
Experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths.
arXiv Detail & Related papers (2024-04-08T16:46:25Z) - Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon [16.416356358224842]
Diffusion models (DMs) are a powerful generative framework that have attracted significant attention in recent years.
We propose two strategies to accelerate the training of DMs.
arXiv Detail & Related papers (2024-03-14T13:27:04Z) - Fast Diffusion Model [122.36693015093041]
Diffusion models (DMs) have been adopted across diverse fields with their abilities in capturing intricate data distributions.
In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a DM optimization perspective.
arXiv Detail & Related papers (2023-06-12T09:38:04Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive
Text Generation [94.4634088113513]
Diffusion-NAT introduces discrete diffusion models into NAR text-to-text generation and integrates BART to improve the performance.
Experimental results on 7 datasets show that our approach can outperform competitive NAR methods, and even surpass autoregressive methods.
arXiv Detail & Related papers (2023-05-06T13:20:31Z) - Post-training Quantization on Diffusion Models [14.167428759401703]
Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data.
These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise.
Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations.
arXiv Detail & Related papers (2022-11-28T19:33:39Z) - Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.