Related papers: A Fast Knowledge Distillation Framework for Visual Recognition

A Fast Knowledge Distillation Framework for Visual Recognition

URL: http://arxiv.org/abs/2112.01528v1
Date: Thu, 2 Dec 2021 18:59:58 GMT
Title: A Fast Knowledge Distillation Framework for Visual Recognition
Authors: Zhiqiang Shen and Eric Xing
Abstract summary: Fast Knowledge Distillation (FKD) framework replicates the distillation training phase and generates soft labels using the multi-crop KD approach. Our FKD is even more efficient than the traditional image classification framework.
Score: 17.971973892352864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Knowledge Distillation (KD) has been recognized as a useful tool in many visual tasks, such as supervised classification and self-supervised representation learning, the main drawback of a vanilla KD framework is its mechanism, which consumes the majority of the computational overhead on forwarding through the giant teacher networks, making the entire learning procedure inefficient and costly. ReLabel, a recently proposed solution, suggests creating a label map for the entire image. During training, it receives the cropped region-level label by RoI aligning on a pre-generated entire label map, allowing for efficient supervision generation without having to pass through the teachers many times. However, as the KD teachers are from conventional multi-crop training, there are various mismatches between the global label-map and region-level label in this technique, resulting in performance deterioration. In this study, we present a Fast Knowledge Distillation (FKD) framework that replicates the distillation training phase and generates soft labels using the multi-crop KD approach, while training faster than ReLabel since no post-processes such as RoI align and softmax operations are used. When conducting multi-crop in the same image for data loading, our FKD is even more efficient than the traditional image classification framework. On ImageNet-1K, we obtain 79.8% with ResNet-50, outperforming ReLabel by ~1.0% while being faster. On the self-supervised learning task, we also show that FKD has an efficiency advantage. Our project page: http://zhiqiangshen.com/projects/FKD/index.html, source code and models are available at: https://github.com/szq0214/FKD.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge. We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks. Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z)
CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation [4.182345120164705]
This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD) CES-KD is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum. Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image.
arXiv Detail & Related papers (2022-09-15T21:02:57Z)
Black-box Few-shot Knowledge Distillation [55.27881513982002]
Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. We propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks.
arXiv Detail & Related papers (2022-07-25T12:16:53Z)
Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images. MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z)
InDistill: Information flow-preserving knowledge distillation for model compression [20.88709060450944]
We introduce InDistill, a method that serves as a warmup stage for Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2022-05-20T07:40:09Z)
Self Supervision to Distillation for Long-Tailed Visual Recognition [34.29744530188875]
We show that soft label can serve as a powerful solution to incorporate label correlation into a multi-stage training scheme for long-tailed recognition. Specifically, we propose a conceptually simple yet particularly effective multi-stage training scheme, termed as Self Supervised to Distillation (SSD) Our method achieves the state-of-the-art results on three long-tailed recognition benchmarks: ImageNet-LT, CIFAR100-LT and iist 2018.
arXiv Detail & Related papers (2021-09-09T07:38:30Z)
Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD) Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation. We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z)
Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks. It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features. It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z)
Semantic Labeling of Large-Area Geographic Regions Using Multi-View and Multi-Date Satellite Images and Noisy OSM Training Labels [0.0]
We present a novel multi-view training framework and CNN architecture for semantically label buildings and roads. Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches.
arXiv Detail & Related papers (2020-08-24T09:03:31Z)
Big Self-Supervised Models are Strong Semi-Supervised Learners [116.00752519907725]
We show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network.
arXiv Detail & Related papers (2020-06-17T17:48:22Z)
Inter-Region Affinity Distillation for Road Marking Segmentation [81.3619453527367]
We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network. Our method is known as Inter-Region Affinity KD (IntRA-KD)
arXiv Detail & Related papers (2020-04-11T04:26:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.