A Fast Knowledge Distillation Framework for Visual Recognition
- URL: http://arxiv.org/abs/2112.01528v1
- Date: Thu, 2 Dec 2021 18:59:58 GMT
- Title: A Fast Knowledge Distillation Framework for Visual Recognition
- Authors: Zhiqiang Shen and Eric Xing
- Abstract summary: Fast Knowledge Distillation (FKD) framework replicates the distillation training phase and generates soft labels using the multi-crop KD approach.
Our FKD is even more efficient than the traditional image classification framework.
- Score: 17.971973892352864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Knowledge Distillation (KD) has been recognized as a useful tool in
many visual tasks, such as supervised classification and self-supervised
representation learning, the main drawback of a vanilla KD framework is its
mechanism, which consumes the majority of the computational overhead on
forwarding through the giant teacher networks, making the entire learning
procedure inefficient and costly. ReLabel, a recently proposed solution,
suggests creating a label map for the entire image. During training, it
receives the cropped region-level label by RoI aligning on a pre-generated
entire label map, allowing for efficient supervision generation without having
to pass through the teachers many times. However, as the KD teachers are from
conventional multi-crop training, there are various mismatches between the
global label-map and region-level label in this technique, resulting in
performance deterioration. In this study, we present a Fast Knowledge
Distillation (FKD) framework that replicates the distillation training phase
and generates soft labels using the multi-crop KD approach, while training
faster than ReLabel since no post-processes such as RoI align and softmax
operations are used. When conducting multi-crop in the same image for data
loading, our FKD is even more efficient than the traditional image
classification framework. On ImageNet-1K, we obtain 79.8% with ResNet-50,
outperforming ReLabel by ~1.0% while being faster. On the self-supervised
learning task, we also show that FKD has an efficiency advantage. Our project
page: http://zhiqiangshen.com/projects/FKD/index.html, source code and models
are available at: https://github.com/szq0214/FKD.
Related papers
- Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - CES-KD: Curriculum-based Expert Selection for Guided Knowledge
Distillation [4.182345120164705]
This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD)
CES-KD is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum.
Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image.
arXiv Detail & Related papers (2022-09-15T21:02:57Z) - Black-box Few-shot Knowledge Distillation [55.27881513982002]
Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network.
We propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher.
We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks.
arXiv Detail & Related papers (2022-07-25T12:16:53Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Self Supervision to Distillation for Long-Tailed Visual Recognition [34.29744530188875]
We show that soft label can serve as a powerful solution to incorporate label correlation into a multi-stage training scheme for long-tailed recognition.
Specifically, we propose a conceptually simple yet particularly effective multi-stage training scheme, termed as Self Supervised to Distillation (SSD)
Our method achieves the state-of-the-art results on three long-tailed recognition benchmarks: ImageNet-LT, CIFAR100-LT and iist 2018.
arXiv Detail & Related papers (2021-09-09T07:38:30Z) - Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - Semantic Labeling of Large-Area Geographic Regions Using Multi-View and
Multi-Date Satellite Images and Noisy OSM Training Labels [0.0]
We present a novel multi-view training framework and CNN architecture for semantically label buildings and roads.
Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches.
arXiv Detail & Related papers (2020-08-24T09:03:31Z) - Big Self-Supervised Models are Strong Semi-Supervised Learners [116.00752519907725]
We show that it is surprisingly effective for semi-supervised learning on ImageNet.
A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning.
We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network.
arXiv Detail & Related papers (2020-06-17T17:48:22Z) - Inter-Region Affinity Distillation for Road Marking Segmentation [81.3619453527367]
We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network.
Our method is known as Inter-Region Affinity KD (IntRA-KD)
arXiv Detail & Related papers (2020-04-11T04:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.