Related papers: Hybrid Data-Free Knowledge Distillation

Hybrid Data-Free Knowledge Distillation

URL: http://arxiv.org/abs/2412.13525v1
Date: Wed, 18 Dec 2024 05:52:16 GMT
Title: Hybrid Data-Free Knowledge Distillation
Authors: Jialiang Tang, Shuo Chen, Chen Gong,
Abstract summary: We propose a data-free knowledge distillation method called textbfHybrtextbfid textbfData-textbfFree textbfDistillation (HiDFD)<n>Our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods.
Score: 11.773963069904955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generation-based methods train student networks by collecting massive real examples and generating synthetic examples, respectively. However, they inevitably become weak in practical scenarios due to the difficulties in gathering or emulating sufficient real-world data. To solve this problem, we propose a novel method called \textbf{H}ybr\textbf{i}d \textbf{D}ata-\textbf{F}ree \textbf{D}istillation (HiDFD), which leverages only a small amount of collected data as well as generates sufficient examples for training student networks. Our HiDFD comprises two primary modules, \textit{i.e.}, the teacher-guided generation and student distillation. The teacher-guided generation module guides a Generative Adversarial Network (GAN) by the teacher network to produce high-quality synthetic examples from very few real-world collected examples. Specifically, we design a feature integration mechanism to prevent the GAN from overfitting and facilitate the reliable representation learning from the teacher network. Meanwhile, we drive a category frequency smoothing technique via the teacher network to balance the generative training of each category. In the student distillation module, we explore a data inflation strategy to properly utilize a blend of real and synthetic data to train the student network via a classifier-sharing-based feature alignment technique. Intensive experiments across multiple benchmarks demonstrate that our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods. Code is available at https://github.com/tangjialiang97/HiDFD.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Sampling to Distill: Knowledge Transfer from Open-World Data [28.74835717488114]
We propose a novel Open-world Data Sampling Distillation (ODSD) method for the Data-Free Knowledge Distillation (DFKD) task without the redundant generation process. First, we try to sample open-world data close to the original data's distribution by an adaptive sampling module. Then, we build structured relationships of multiple data examples to exploit data knowledge through the student model itself and the teacher's structured representation.
arXiv Detail & Related papers (2023-07-31T12:05:55Z)
Distribution Shift Matters for Knowledge Distillation with Webly Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$) We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network. We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Synthetic data generation method for data-free knowledge distillation in regression neural networks [0.0]
Knowledge distillation is the technique of compressing a larger neural network, known as the teacher, into a smaller neural network, known as the student. Previous work has proposed a data-free knowledge distillation method where synthetic data are generated using a generator model trained adversarially against the student model. In this study, we investigate the behavior of various synthetic data generation methods and propose a new synthetic data generation strategy.
arXiv Detail & Related papers (2023-01-11T07:26:00Z)
Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN) To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model. Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z)
Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay [49.691610143011566]
We propose two novel knowledge transfer techniques for class-incremental learning (CIL) First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples from a generative model. Second, we introduce dual-teacher information distillation (DT-ID) for knowledge distillation from two teachers to one student.
arXiv Detail & Related papers (2021-06-17T22:13:15Z)
Dual Discriminator Adversarial Distillation for Data-free Model Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z)
Generative Adversarial Simulator [2.3986080077861787]
We introduce a simulator-free approach to knowledge distillation in the context of reinforcement learning. A key challenge is having the student learn the multiplicity of cases that correspond to a given action. This is the first demonstration of simulator-free knowledge distillation between a teacher and a student policy.
arXiv Detail & Related papers (2020-11-23T15:31:12Z)
Lessons Learned from the Training of GANs on Artificial Datasets [0.0]
Generative Adversarial Networks (GANs) have made great progress in synthesizing realistic images in recent years. GANs are prone to underfitting or overfitting, making the analysis of them difficult and constrained. We train them on artificial datasets where there are infinitely many samples and the real data distributions are simple. We find that training mixtures of GANs leads to more performance gain compared to increasing the network depth or width.
arXiv Detail & Related papers (2020-07-13T14:51:02Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [80.17705319689139]
We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers. The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
arXiv Detail & Related papers (2020-03-20T03:20:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.