AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation
- URL: http://arxiv.org/abs/2403.07030v2
- Date: Mon, 18 Mar 2024 02:45:04 GMT
- Title: AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation
- Authors: Zihao Tang, Zheqi Lv, Shengyu Zhang, Yifan Zhou, Xinyu Duan, Fei Wu, Kun Kuang,
- Abstract summary: Data-Free Knowledge Distillation (DFKD) methods have emerged as direct solutions.
However, simply adopting models derived from DFKD for real-world applications suffers significant performance degradation.
We propose a simple but effective method AuG-KD to selectively transfer teachers' appropriate knowledge.
- Score: 33.208860361882095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to privacy or patent concerns, a growing number of large models are released without granting access to their training data, making transferring their knowledge inefficient and problematic. In response, Data-Free Knowledge Distillation (DFKD) methods have emerged as direct solutions. However, simply adopting models derived from DFKD for real-world applications suffers significant performance degradation, due to the discrepancy between teachers' training data and real-world scenarios (student domain). The degradation stems from the portions of teachers' knowledge that are not applicable to the student domain. They are specific to the teacher domain and would undermine students' performance. Hence, selectively transferring teachers' appropriate knowledge becomes the primary challenge in DFKD. In this work, we propose a simple but effective method AuG-KD. It utilizes an uncertainty-guided and sample-specific anchor to align student-domain data with the teacher domain and leverages a generative method to progressively trade off the learning process between OOD knowledge distillation and domain-specific information learning via mixup learning. Extensive experiments in 3 datasets and 8 settings demonstrate the stability and superiority of our approach. Code available at https://github.com/IshiKura-a/AuG-KD .
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - Selective Knowledge Sharing for Privacy-Preserving Federated
Distillation without A Good Teacher [52.2926020848095]
Federated learning is vulnerable to white-box attacks and struggles to adapt to heterogeneous clients.
This paper proposes a selective knowledge sharing mechanism for FD, termed Selective-FD.
arXiv Detail & Related papers (2023-04-04T12:04:19Z) - Improved knowledge distillation by utilizing backward pass knowledge in
neural networks [17.437510399431606]
Knowledge distillation (KD) is one of the prominent techniques for model compression.
In this work, we generate new auxiliary training samples based on extracting knowledge from the backward pass of the teacher.
We show how this technique can be used successfully in applications of natural language processing (NLP) and language understanding.
arXiv Detail & Related papers (2023-01-27T22:07:38Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Data-Free Knowledge Transfer: A Survey [13.335198869928167]
knowledge distillation (KD) and domain adaptation (DA) are proposed and become research highlights.
They both aim to transfer useful information from a well-trained model with original training data.
Recently, the data-free knowledge transfer paradigm has attracted appealing attention.
arXiv Detail & Related papers (2021-12-31T03:39:42Z) - Semi-Online Knowledge Distillation [2.373824287636486]
Conventional knowledge distillation (KD) is to transfer knowledge from a large and well pre-trained teacher network to a small student network.
Deep mutual learning (DML) has been proposed to help student networks learn collaboratively and simultaneously.
We propose a Semi-Online Knowledge Distillation (SOKD) method that effectively improves the performance of the student and the teacher.
arXiv Detail & Related papers (2021-11-23T09:44:58Z) - Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data [56.29595334715237]
Knowledge distillation(KD) aims to craft a compact student model that imitates the behavior of a pre-trained teacher in a target domain.
We introduce a handy yet surprisingly efficacious approach, dubbed astextitMosaicKD.
In MosaicKD, this is achieved through a four-player min-max game, in which a generator, a discriminator, a student network, are collectively trained in an adversarial manner.
arXiv Detail & Related papers (2021-10-27T13:01:10Z) - Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z) - Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation.
We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.