Up to 100x Faster Data-free Knowledge Distillation
- URL: http://arxiv.org/abs/2112.06253v1
- Date: Sun, 12 Dec 2021 14:56:58 GMT
- Title: Up to 100x Faster Data-free Knowledge Distillation
- Authors: Gongfan Fang, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei
Zhang, Mingli Song
- Abstract summary: We introduce FastDFKD, which allows us to accelerate DFKD by a factor of orders of magnitude.
Unlike prior methods that optimize a set of data independently, we propose to learn a meta-synthesizer that seeks common features.
FastDFKD achieves data synthesis within only a few steps, significantly enhancing the efficiency of data-free training.
- Score: 52.666615987503995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-free knowledge distillation (DFKD) has recently been attracting
increasing attention from research communities, attributed to its capability to
compress a model only using synthetic data. Despite the encouraging results
achieved, state-of-the-art DFKD methods still suffer from the inefficiency of
data synthesis, making the data-free training process extremely time-consuming
and thus inapplicable for large-scale tasks. In this work, we introduce an
efficacious scheme, termed as FastDFKD, that allows us to accelerate DFKD by a
factor of orders of magnitude. At the heart of our approach is a novel strategy
to reuse the shared common features in training data so as to synthesize
different data instances. Unlike prior methods that optimize a set of data
independently, we propose to learn a meta-synthesizer that seeks common
features as the initialization for the fast data synthesis. As a result,
FastDFKD achieves data synthesis within only a few steps, significantly
enhancing the efficiency of data-free training. Experiments over CIFAR, NYUv2,
and ImageNet demonstrate that the proposed FastDFKD achieves 10$\times$ and
even 100$\times$ acceleration while preserving performances on par with state
of the art.
Related papers
- Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation [20.556083321381514]
Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression.
This paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA)
Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method.
arXiv Detail & Related papers (2024-10-23T07:01:16Z) - De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts [32.1016787150064]
Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data.
Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data.
This paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts.
arXiv Detail & Related papers (2024-03-28T16:13:22Z) - Sampling to Distill: Knowledge Transfer from Open-World Data [28.74835717488114]
We propose a novel Open-world Data Sampling Distillation (ODSD) method for the Data-Free Knowledge Distillation (DFKD) task without the redundant generation process.
First, we try to sample open-world data close to the original data's distribution by an adaptive sampling module.
Then, we build structured relationships of multiple data examples to exploit data knowledge through the student model itself and the teacher's structured representation.
arXiv Detail & Related papers (2023-07-31T12:05:55Z) - Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning
Strategy [20.248947197916642]
We propose a novel DFKD method called CuDFKD.
It teaches students by a dynamic strategy that gradually generates easy-to-hard pseudo samples.
Experiments show CuDFKD has comparable performance to state-of-the-art (SOTA) DFKD methods on all datasets.
arXiv Detail & Related papers (2022-08-29T14:51:57Z) - Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Making Online Sketching Hashing Even Faster [63.16042585506435]
We present a FasteR Online Sketching Hashing (FROSH) algorithm to sketch the data in a more compact form via an independent transformation.
We provide theoretical justification to guarantee that our proposed FROSH consumes less time and achieves a comparable sketching precision.
We also extend FROSH to its distributed implementation, namely DFROSH, to further reduce the training time cost of FROSH.
arXiv Detail & Related papers (2020-10-10T08:50:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.