Towards a Unified View of Affinity-Based Knowledge Distillation
- URL: http://arxiv.org/abs/2209.15555v1
- Date: Fri, 30 Sep 2022 16:12:25 GMT
- Title: Towards a Unified View of Affinity-Based Knowledge Distillation
- Authors: Vladimir Li and Atsuto Maki
- Abstract summary: We modularise knowledge distillation into a framework of three components, i.e. affinity, normalisation, and loss.
We show how relation-based knowledge distillation could achieve comparable performance to the state of the art in spite of the simplicity.
- Score: 5.482532589225552
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Knowledge transfer between artificial neural networks has become an important
topic in deep learning. Among the open questions are what kind of knowledge
needs to be preserved for the transfer, and how it can be effectively achieved.
Several recent work have shown good performance of distillation methods using
relation-based knowledge. These algorithms are extremely attractive in that
they are based on simple inter-sample similarities. Nevertheless, a proper
metric of affinity and use of it in this context is far from well understood.
In this paper, by explicitly modularising knowledge distillation into a
framework of three components, i.e. affinity, normalisation, and loss, we give
a unified treatment of these algorithms as well as study a number of unexplored
combinations of the modules. With this framework we perform extensive
evaluations of numerous distillation objectives for image classification, and
obtain a few useful insights for effective design choices while demonstrating
how relation-based knowledge distillation could achieve comparable performance
to the state of the art in spite of the simplicity.
Related papers
- Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition [53.359383163184425]
We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks.
This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network.
arXiv Detail & Related papers (2024-06-20T07:24:47Z) - Knowledge Distillation via Token-level Relationship Graph [12.356770685214498]
We propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG)
By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model.
We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-20T08:16:37Z) - Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object
Detection [58.48995335728938]
We learn three types of class-agnostic commonalities between base and novel classes explicitly.
Our method can be readily integrated into most of existing fine-tuning based methods and consistently improve the performance by a large margin.
arXiv Detail & Related papers (2022-07-22T16:46:51Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - A Closer Look at Knowledge Distillation with Features, Logits, and
Gradients [81.39206923719455]
Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another.
This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources.
Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design.
arXiv Detail & Related papers (2022-03-18T21:26:55Z) - Information Theoretic Representation Distillation [20.802135299032308]
We forge an alternative connection between information theory and knowledge distillation using a recently proposed entropy-like functional.
Our method achieves competitive performance to state-of-the-art on the knowledge distillation and cross-model transfer tasks.
We shed light to a new state-of-the-art for binary quantisation.
arXiv Detail & Related papers (2021-12-01T12:39:50Z) - Adaptive Distillation: Aggregating Knowledge from Multiple Paths for
Efficient Distillation [15.337420940135704]
Knowledge Distillation is becoming one of the primary trends among neural network compression algorithms.
This paper introduces our proposed adaptive approach based on multitask learning methods.
We empirically demonstrate the effectiveness of the proposed approach over other baselines on the applications of knowledge distillation in classification, semantic segmentation, and object detection tasks.
arXiv Detail & Related papers (2021-10-19T00:57:40Z) - Towards Understanding Ensemble, Knowledge Distillation and
Self-Distillation in Deep Learning [93.18238573921629]
We study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model.
We show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory.
We prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.
arXiv Detail & Related papers (2020-12-17T18:34:45Z) - On the Orthogonality of Knowledge Distillation with Other Techniques:
From an Ensemble Perspective [34.494730096460636]
We show that knowledge distillation is a powerful apparatus for practical deployment of efficient neural network.
We also introduce ways to integrate knowledge distillation with other methods effectively.
arXiv Detail & Related papers (2020-09-09T06:14:59Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.