Related papers: DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

URL: http://arxiv.org/abs/2409.03550v1
Date: Thu, 5 Sep 2024 14:12:22 GMT
Title: DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Authors: Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie,
Abstract summary: Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas. The most common way to accelerate DMs involves reducing the number of denoising steps during generation. We propose a novel method that transfers the capability of large pretrained DMs to faster architectures.
Score: 69.58440626023541
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas, while they are hindered by slow inference speeds and high computational demands during deployment. The most common way to accelerate DMs involves reducing the number of denoising steps during generation, achieved through faster sampling solvers or knowledge distillation (KD). In contrast to prior approaches, we propose a novel method that transfers the capability of large pretrained DMs to faster architectures. Specifically, we employ KD in a distinct manner to compress DMs by distilling their generative ability into more rapid variants. Furthermore, considering that the source data is either unaccessible or too enormous to store for current generative models, we introduce a new paradigm for their distillation without source data, termed Data-Free Knowledge Distillation for Diffusion Models (DKDM). Generally, our established DKDM framework comprises two main components: 1) a DKDM objective that uses synthetic denoising data produced by pretrained DMs to optimize faster DMs without source data, and 2) a dynamic iterative distillation method that flexibly organizes the synthesis of denoising data, preventing it from slowing down the optimization process as the generation is slow. To our knowledge, this is the first attempt at using KD to distill DMs into any architecture in a data-free manner. Importantly, our DKDM is orthogonal to most existing acceleration methods, such as denoising step reduction, quantization and pruning. Experiments show that our DKDM is capable of deriving 2x faster DMs with performance remaining on par with the baseline. Notably, our DKDM enables pretrained DMs to function as "datasets" for training new DMs.

Related papers

DUSE: A Data Expansion Framework for Low-resource Automatic Modulation Recognition based on Active Learning [17.651073556023167]
We introduce a data expansion framework called Dynamic Uncertainty-driven Sample Expansion (DUSE)<n>DUSE uses an uncertainty scoring function to filter out useful samples from relevant AMR datasets.<n>Experiments demonstrate that DUSE consistently outperforms 8 coreset selection baselines in both class-balance and class-imbalance settings.
arXiv Detail & Related papers (2025-07-16T08:09:41Z)
Multi-Modal Dataset Distillation in the Wild [75.64263877043615]
We propose Multi-modal dataset Distillation in the Wild, i.e., MDW, to distill noisy multi-modal datasets into compact clean ones for effective and efficient model training.<n>Specifically, MDW introduces learnable fine-grained correspondences during distillation and adaptively optimize distilled data to emphasize correspondence-discriminative regions.<n>Extensive experiments validate MDW's theoretical and empirical efficacy with remarkable scalability, surpassing prior methods by over 15% across various compression ratios.
arXiv Detail & Related papers (2025-06-02T12:18:20Z)
Sparse-to-Sparse Training of Diffusion Models [13.443846454835867]
This paper introduces, for the first time, the paradigm of sparse-to-sparse training to DMs. We focus on unconditional generation and train sparse DMs from scratch on six datasets. Our experiments show that sparse DMs are able to match and often outperform their counterparts, while substantially reducing the number of trainable parameters and FLOPs.
arXiv Detail & Related papers (2025-04-30T07:28:11Z)
Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning. Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training. To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z)
Slight Corruption in Pre-training Data Makes Better Diffusion Models [71.90034201302397]
Diffusion models (DMs) have shown remarkable capabilities in generating high-quality images, audios, and videos. DMs benefit significantly from extensive pre-training on large-scale datasets. However, pre-training datasets often contain corrupted pairs where conditions do not accurately describe the data. This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs.
arXiv Detail & Related papers (2024-05-30T21:35:48Z)
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models [39.287947829085155]
This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. Experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths.
arXiv Detail & Related papers (2024-04-08T16:46:25Z)
Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon [16.416356358224842]
Diffusion models (DMs) are a powerful generative framework that have attracted significant attention in recent years. We propose two strategies to accelerate the training of DMs.
arXiv Detail & Related papers (2024-03-14T13:27:04Z)
Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks [66.87070857705994]
In low-resource settings, the amount of seed data samples to use for data augmentation is very small. We propose a novel method that augments training data by incorporating a wealth of examples from other datasets. This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone.
arXiv Detail & Related papers (2024-02-21T02:45:46Z)
Fast Diffusion Model [122.36693015093041]
Diffusion models (DMs) have been adopted across diverse fields with their abilities in capturing intricate data distributions. In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a DM optimization perspective.
arXiv Detail & Related papers (2023-06-12T09:38:04Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis. For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z)
Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation [94.4634088113513]
Diffusion-NAT introduces discrete diffusion models into NAR text-to-text generation and integrates BART to improve the performance. Experimental results on 7 datasets show that our approach can outperform competitive NAR methods, and even surpass autoregressive methods.
arXiv Detail & Related papers (2023-05-06T13:20:31Z)
A Comprehensive Survey on Knowledge Distillation of Diffusion Models [0.0]
Diffusion Models (DMs) utilize neural networks to specify score functions. Our tutorial is intended for individuals with a basic understanding of generative models who wish to apply DM's distillation or embark on a research project in this field.
arXiv Detail & Related papers (2023-04-09T15:49:28Z)
Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset. This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z)
Post-training Quantization on Diffusion Models [14.167428759401703]
Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations.
arXiv Detail & Related papers (2022-11-28T19:33:39Z)
Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data. We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors. As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z)
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network. We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples. We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.