Knowledge Distillation via the Target-aware Transformer
- URL: http://arxiv.org/abs/2205.10793v2
- Date: Mon, 8 Apr 2024 16:59:24 GMT
- Title: Knowledge Distillation via the Target-aware Transformer
- Authors: Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang,
- Abstract summary: We propose a novel one-to-all spatial matching knowledge distillation approach.
Specifically, we allow each pixel of the teacher feature to be distilled to all spatial locations of the student features.
Our approach surpasses the state-of-the-art methods by a significant margin on various computer vision benchmarks.
- Score: 83.03578375615614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation becomes a de facto standard to improve the performance of small neural networks. Most of the previous works propose to regress the representational features from the teacher to the student in a one-to-one spatial matching fashion. However, people tend to overlook the fact that, due to the architecture differences, the semantic information on the same spatial location usually vary. This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach. Specifically, we allow each pixel of the teacher feature to be distilled to all spatial locations of the student features given its similarity, which is generated from a target-aware transformer. Our approach surpasses the state-of-the-art methods by a significant margin on various computer vision benchmarks, such as ImageNet, Pascal VOC and COCOStuff10k. Code is available at https://github.com/sihaoevery/TaT.
Related papers
- An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels [65.64402188506644]
vanilla Transformers can operate by treating each individual pixel as a token and achieve highly performant results.
We mainly showcase the effectiveness of pixels-as-tokens across three well-studied tasks in computer vision.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers [22.1372572833618]
We propose a novel few-shot feature distillation approach for vision transformers.
We first copy the weights from intermittent layers of existing vision transformers into shallower architectures (students)
Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario.
arXiv Detail & Related papers (2024-04-14T18:57:38Z) - Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning.
We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view.
We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z) - Distilling Inter-Class Distance for Semantic Segmentation [17.76592932725305]
We propose an Inter-class Distance Distillation (IDD) method to transfer the inter-class distance in the feature space from the teacher network to the student network.
Our method is helpful to improve the accuracy of semantic segmentation models and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-05-07T13:13:55Z) - SATS: Self-Attention Transfer for Continual Semantic Segmentation [50.51525791240729]
continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning.
This study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements within each image.
The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model.
arXiv Detail & Related papers (2022-03-15T06:09:28Z) - It's All in the Head: Representation Knowledge Distillation through
Classifier Sharing [0.29360071145551075]
We introduce two approaches for enhancing representation distillation using classifier sharing between the teacher and student.
We show the effectiveness of the proposed methods on various datasets and tasks, including image classification, fine-grained classification, and face verification.
arXiv Detail & Related papers (2022-01-18T13:10:36Z) - Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition [124.80263629921498]
We propose Pixel Distillation that extends knowledge distillation into the input level while simultaneously breaking architecture constraints.
Such a scheme can achieve flexible cost control for deployment, as it allows the system to adjust both network architecture and image quality according to the overall requirement of resources.
arXiv Detail & Related papers (2021-12-17T14:31:40Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.