Black-box Few-shot Knowledge Distillation
- URL: http://arxiv.org/abs/2207.12106v1
- Date: Mon, 25 Jul 2022 12:16:53 GMT
- Title: Black-box Few-shot Knowledge Distillation
- Authors: Dang Nguyen, Sunil Gupta, Kien Do, Svetha Venkatesh
- Abstract summary: Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network.
We propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher.
We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks.
- Score: 55.27881513982002
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation (KD) is an efficient approach to transfer the
knowledge from a large "teacher" network to a smaller "student" network.
Traditional KD methods require lots of labeled training samples and a white-box
teacher (parameters are accessible) to train a good student. However, these
resources are not always available in real-world applications. The distillation
process often happens at an external party side where we do not have access to
much data, and the teacher does not disclose its parameters due to security and
privacy concerns. To overcome these challenges, we propose a black-box few-shot
KD method to train the student with few unlabeled training samples and a
black-box teacher. Our main idea is to expand the training set by generating a
diverse set of out-of-distribution synthetic images using MixUp and a
conditional variational auto-encoder. These synthetic images along with their
labels obtained from the teacher are used to train the student. We conduct
extensive experiments to show that our method significantly outperforms recent
SOTA few/zero-shot KD methods on image classification tasks. The code and
models are available at: https://github.com/nphdang/FS-BBT
Related papers
- Data-Free Knowledge Distillation Using Adversarially Perturbed OpenGL
Shader Images [5.439020425819001]
Knowledge distillation (KD) has been a popular and effective method for model compression.
"Data-free" KD has emerged as a growing research topic which focuses on the scenario of performing KD when no data is provided.
We propose a new approach to data-free KD that utilizes unnatural images, combined with large amounts of data augmentation and adversarial attacks.
arXiv Detail & Related papers (2023-10-20T19:28:50Z) - Improved knowledge distillation by utilizing backward pass knowledge in
neural networks [17.437510399431606]
Knowledge distillation (KD) is one of the prominent techniques for model compression.
In this work, we generate new auxiliary training samples based on extracting knowledge from the backward pass of the teacher.
We show how this technique can be used successfully in applications of natural language processing (NLP) and language understanding.
arXiv Detail & Related papers (2023-01-27T22:07:38Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - CES-KD: Curriculum-based Expert Selection for Guided Knowledge
Distillation [4.182345120164705]
This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD)
CES-KD is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum.
Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image.
arXiv Detail & Related papers (2022-09-15T21:02:57Z) - Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model [8.87104231451079]
Knowledge distillation is a successful approach for deep neural network acceleration.
In tradition, KD usually relies on access to the training samples and the parameters of the white-box teacher to acquire the transferred knowledge.
Here we propose the concept of decision-based black-box (DB3) knowledge distillation, with which the student is trained by distilling the knowledge from a black-box teacher.
arXiv Detail & Related papers (2021-06-07T02:46:31Z) - Undistillable: Making A Nasty Teacher That CANNOT teach students [84.6111281091602]
This paper introduces and investigates a concept called Nasty Teacher: a specially trained teacher network that yields nearly the same performance as a normal one.
We propose a simple yet effective algorithm to build the nasty teacher, called self-undermining knowledge distillation.
arXiv Detail & Related papers (2021-05-16T08:41:30Z) - Progressive Network Grafting for Few-Shot Knowledge Distillation [60.38608462158474]
We introduce a principled dual-stage distillation scheme tailored for few-shot data.
In the first step, we graft the student blocks one by one onto the teacher, and learn the parameters of the grafted block intertwined with those of the other teacher blocks.
Experiments demonstrate that our approach, with only a few unlabeled samples, achieves gratifying results on CIFAR10, CIFAR100, and ILSVRC-2012.
arXiv Detail & Related papers (2020-12-09T08:34:36Z) - Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.