Beyond Classification: Knowledge Distillation using Multi-Object
Impressions
- URL: http://arxiv.org/abs/2110.14215v1
- Date: Wed, 27 Oct 2021 06:59:27 GMT
- Title: Beyond Classification: Knowledge Distillation using Multi-Object
Impressions
- Authors: Gaurav Kumar Nayak, Monish Keswani, Sharan Seshadri, Anirban
Chakraborty
- Abstract summary: Knowledge Distillation (KD) utilizes training data as a transfer set to transfer knowledge from a complex network (Teacher) to a smaller network (Student)
Several works have recently identified many scenarios where the training data may not be available due to data privacy or sensitivity concerns.
We, for the first time, solve a much more challenging problem, i.e., "KD for object detection with zero knowledge about the training data and its statistics"
- Score: 17.214664783818687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge Distillation (KD) utilizes training data as a transfer set to
transfer knowledge from a complex network (Teacher) to a smaller network
(Student). Several works have recently identified many scenarios where the
training data may not be available due to data privacy or sensitivity concerns
and have proposed solutions under this restrictive constraint for the
classification task. Unlike existing works, we, for the first time, solve a
much more challenging problem, i.e., "KD for object detection with zero
knowledge about the training data and its statistics". Our proposed approach
prepares pseudo-targets and synthesizes corresponding samples (termed as
"Multi-Object Impressions"), using only the pretrained Faster RCNN Teacher
network. We use this pseudo-dataset as a transfer set to conduct zero-shot KD
for object detection. We demonstrate the efficacy of our proposed method
through several ablations and extensive experiments on benchmark datasets like
KITTI, Pascal and COCO. Our approach with no training samples, achieves a
respectable mAP of 64.2% and 55.5% on the student with same and half capacity
while performing distillation from a Resnet-18 Teacher of 73.3% mAP on KITTI.
Related papers
- Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks [10.932880269282014]
We propose the first effective DD method for SSL pre-training.
Specifically, we train a small student model to match the representations of a larger teacher model trained with SSL.
As the KD objective has considerably lower variance than SSL, our approach can generate synthetic datasets that can successfully pre-train high-quality encoders.
arXiv Detail & Related papers (2024-10-03T00:39:25Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - Data-Free Knowledge Distillation with Soft Targeted Transfer Set
Synthesis [8.87104231451079]
Knowledge distillation (KD) has proved to be an effective approach for deep neural network compression.
In traditional KD, the transferred knowledge is usually obtained by feeding training samples to the teacher network.
The original training dataset is not always available due to storage costs or privacy issues.
We propose a novel data-free KD approach by modeling the intermediate feature space of the teacher.
arXiv Detail & Related papers (2021-04-10T22:42:14Z) - OvA-INN: Continual Learning with Invertible Neural Networks [0.0]
OvA-INN is able to learn one class at a time and without storing any of the previous data.
We show that we can take advantage of pretrained models by stacking an Invertible Network on top of a feature extractor.
arXiv Detail & Related papers (2020-06-24T14:40:05Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.