Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation
- URL: http://arxiv.org/abs/2302.14290v1
- Date: Tue, 28 Feb 2023 03:50:56 GMT
- Title: Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation
- Authors: Gaurav Patel, Konda Reddy Mopuri, Qiang Qiu
- Abstract summary: Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data.
We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
- Score: 31.294947552032088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-free Knowledge Distillation (DFKD) has gained popularity recently, with
the fundamental idea of carrying out knowledge transfer from a Teacher neural
network to a Student neural network in the absence of training data. However,
in the Adversarial DFKD framework, the student network's accuracy, suffers due
to the non-stationary distribution of the pseudo-samples under multiple
generator updates. To this end, at every generator update, we aim to maintain
the student's performance on previously encountered examples while acquiring
knowledge from samples of the current distribution. Thus, we propose a
meta-learning inspired framework by treating the task of Knowledge-Acquisition
(learning from newly generated samples) and Knowledge-Retention (retaining
knowledge on previously met samples) as meta-train and meta-test, respectively.
Hence, we dub our method as Learning to Retain while Acquiring. Moreover, we
identify an implicit aligning factor between the Knowledge-Retention and
Knowledge-Acquisition tasks indicating that the proposed student update
strategy enforces a common gradient direction for both tasks, alleviating
interference between the two objectives. Finally, we support our hypothesis by
exhibiting extensive evaluation and comparison of our method with prior arts on
multiple datasets.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - An Investigation of the Combination of Rehearsal and Knowledge
Distillation in Continual Learning for Spoken Language Understanding [9.447108578893639]
We consider the joint use of rehearsal and knowledge distillation approaches for spoken language understanding under a class-incremental learning scenario.
We report on multiple KD combinations at different levels in the network, showing that combining feature-level and predictions-level KDs leads to the best results.
arXiv Detail & Related papers (2022-11-15T14:15:22Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Dynamic Knowledge embedding and tracing [18.717482292051788]
We propose a novel approach to knowledge tracing that combines techniques from matrix factorization with recent progress in recurrent neural networks (RNNs)
The proposed emphDynEmb framework enables the tracking of student knowledge even without the concept/skill tag information.
arXiv Detail & Related papers (2020-05-18T21:56:42Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.