Distilling and Transferring Knowledge via cGAN-generated Samples for
  Image Classification and Regression
        - URL: http://arxiv.org/abs/2104.03164v1
- Date: Wed, 7 Apr 2021 14:52:49 GMT
- Title: Distilling and Transferring Knowledge via cGAN-generated Samples for
  Image Classification and Regression
- Authors: Xin Ding and Z. Jane Wang and Zuheng Xu and Yongwei Wang and William
  J. Welch
- Abstract summary: We propose a unified KD framework based on conditional generative adversarial networks (cGANs)
 cGAN-KD distills and transfers knowledge from a teacher model to a student model via cGAN-generated samples.
Experiments on CIFAR-10 and Tiny-ImageNet show we can incorporate KD methods into the cGAN-KD framework to reach a new state of the art.
- Score: 17.12028267150745
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract:   Knowledge distillation (KD) has been actively studied for image
classification tasks in deep learning, aiming to improve the performance of a
student model based on the knowledge from a teacher model. However, there have
been very few efforts for applying KD in image regression with a scalar
response, and there is no KD method applicable to both tasks. Moreover,
existing KD methods often require a practitioner to carefully choose or adjust
the teacher and student architectures, making these methods less scalable in
practice. Furthermore, although KD is usually conducted in scenarios with
limited labeled data, very few techniques are developed to alleviate such data
insufficiency. To solve the above problems in an all-in-one manner, we propose
in this paper a unified KD framework based on conditional generative
adversarial networks (cGANs), termed cGAN-KD. Fundamentally different from
existing KD methods, cGAN-KD distills and transfers knowledge from a teacher
model to a student model via cGAN-generated samples. This unique mechanism
makes cGAN-KD suitable for both classification and regression tasks, compatible
with other KD methods, and insensitive to the teacher and student
architectures. Also, benefiting from the recent advances in cGAN methodology
and our specially designed subsampling and filtering procedures, cGAN-KD also
performs well when labeled data are scarce. An error bound of a student model
trained in the cGAN-KD framework is derived in this work, which theoretically
explains why cGAN-KD takes effect and guides the implementation of cGAN-KD in
practice. Extensive experiments on CIFAR-10 and Tiny-ImageNet show that we can
incorporate state-of-the-art KD methods into the cGAN-KD framework to reach a
new state of the art. Also, experiments on RC-49 and UTKFace demonstrate the
effectiveness of cGAN-KD in image regression tasks, where existing KD methods
are inapplicable.
 
      
        Related papers
        - Speculative Knowledge Distillation: Bridging the Teacher-Student Gap   Through Interleaved Sampling [81.00825302340984]
 We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
 arXiv  Detail & Related papers  (2024-10-15T06:51:25Z)
- Efficient and Robust Knowledge Distillation from A Stronger Teacher   Based on Correlation Matching [0.09999629695552192]
 Correlation Matching Knowledge Distillation (CMKD) method combines the Pearson and Spearman correlation coefficients-based KD loss to achieve more efficient and robust distillation from a stronger teacher model.
 CMKD is simple yet practical, and extensive experiments demonstrate that it can consistently achieve state-of-the-art performance on CIRAR-100 and ImageNet.
 arXiv  Detail & Related papers  (2024-10-09T05:42:47Z)
- Revisiting Knowledge Distillation for Autoregressive Language Models [88.80146574509195]
 We propose a simple yet effective adaptive teaching approach (ATKD) to improve the knowledge distillation (KD)
The core of ATKD is to reduce rote learning and make teaching more diverse and flexible.
Experiments on 8 LM tasks show that, with the help of ATKD, various baseline KD methods can achieve consistent and significant performance gains.
 arXiv  Detail & Related papers  (2024-02-19T07:01:10Z)
- Comparative Knowledge Distillation [102.35425896967791]
 Traditional Knowledge Distillation (KD) assumes readily available access to teacher models for frequent inference.
We propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples.
CKD consistently outperforms state of the art data augmentation and KD techniques.
 arXiv  Detail & Related papers  (2023-11-03T21:55:33Z)
- Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free
  Continual Learning [14.379472108242235]
 We investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy.
KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks.
Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main models during incremental training.
 arXiv  Detail & Related papers  (2023-08-18T13:22:59Z)
- How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
 This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
 arXiv  Detail & Related papers  (2021-10-22T21:30:53Z)
- Confidence Conditioned Knowledge Distillation [8.09591217280048]
 A confidence conditioned knowledge distillation (CCKD) scheme for transferring the knowledge from a teacher model to a student model is proposed.
CCKD addresses these issues by leveraging the confidence assigned by the teacher model to the correct class to devise sample-specific loss functions and targets.
 Empirical evaluations on several benchmark datasets show that CCKD methods achieve at least as much generalization performance levels as other state-of-the-art methods.
 arXiv  Detail & Related papers  (2021-07-06T00:33:25Z)
- KDExplainer: A Task-oriented Attention Model for Explaining Knowledge
  Distillation [59.061835562314066]
 We introduce a novel task-oriented attention model, termed as KDExplainer, to shed light on the working mechanism underlying the vanilla KD.
We also introduce a portable tool, dubbed as virtual attention module (VAM), that can be seamlessly integrated with various deep neural networks (DNNs) to enhance their performance under KD.
 arXiv  Detail & Related papers  (2021-05-10T08:15:26Z)
- Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
 We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
 arXiv  Detail & Related papers  (2020-05-02T06:56:56Z)
- Modeling Teacher-Student Techniques in Deep Neural Networks for
  Knowledge Distillation [9.561123408923489]
 Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one.
In this paper, various studies in the scope of KD are investigated and analyzed to build a general model for KD.
The advantages and disadvantages of different approaches in KD can be better understood and develop a new strategy for KD can be possible.
 arXiv  Detail & Related papers  (2019-12-31T05:32:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.