Knowledge Distillation via Token-level Relationship Graph
- URL: http://arxiv.org/abs/2306.12442v1
- Date: Tue, 20 Jun 2023 08:16:37 GMT
- Title: Knowledge Distillation via Token-level Relationship Graph
- Authors: Shuoxi Zhang, Hanpeng Liu, Kun He
- Abstract summary: We propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG)
By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model.
We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches.
- Score: 12.356770685214498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation is a powerful technique for transferring knowledge
from a pre-trained teacher model to a student model. However, the true
potential of knowledge transfer has not been fully explored. Existing
approaches primarily focus on distilling individual information or
instance-level relationships, overlooking the valuable information embedded in
token-level relationships, which may be particularly affected by the long-tail
effects. To address the above limitations, we propose a novel method called
Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages
the token-wise relational knowledge to enhance the performance of knowledge
distillation. By employing TRG, the student model can effectively emulate
higher-level semantic information from the teacher model, resulting in improved
distillation results. To further enhance the learning process, we introduce a
token-wise contextual loss called contextual loss, which encourages the student
model to capture the inner-instance semantic contextual of the teacher model.
We conduct experiments to evaluate the effectiveness of the proposed method
against several state-of-the-art approaches. Empirical results demonstrate the
superiority of TRG across various visual classification tasks, including those
involving imbalanced data. Our method consistently outperforms the existing
baselines, establishing a new state-of-the-art performance in the field of
knowledge distillation.
Related papers
- Adaptive Explicit Knowledge Transfer for Knowledge Distillation [17.739979156009696]
We show that the performance of logit-based knowledge distillation can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model.
We propose a new loss that enables the student to learn explicit knowledge along with implicit knowledge in an adaptive manner.
Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods.
arXiv Detail & Related papers (2024-09-03T07:42:59Z) - Graph Relation Distillation for Efficient Biomedical Instance
Segmentation [80.51124447333493]
We propose a graph relation distillation approach for efficient biomedical instance segmentation.
We introduce two graph distillation schemes deployed at both the intra-image level and the inter-image level.
Experimental results on a number of biomedical datasets validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-01-12T04:41:23Z) - The Staged Knowledge Distillation in Video Classification: Harmonizing
Student Progress by a Complementary Weakly Supervised Framework [21.494759678807686]
We propose a new weakly supervised learning framework for knowledge distillation in video classification.
Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages.
Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
arXiv Detail & Related papers (2023-07-11T12:10:42Z) - AD-KD: Attribution-Driven Knowledge Distillation for Language Model
Compression [26.474962405945316]
We present a novel attribution-driven knowledge distillation approach to compress pre-trained language models.
To enhance the knowledge transfer of model reasoning and generalization, we explore multi-view attribution distillation on all potential decisions of the teacher.
arXiv Detail & Related papers (2023-05-17T07:40:12Z) - Hint-dynamic Knowledge Distillation [30.40008256306688]
Hint-dynamic Knowledge Distillation, dubbed HKD, excavates the knowledge from the teacher's hints in a dynamic scheme.
A meta-weight network is introduced to generate the instance-wise weight coefficients about knowledge hints.
Experiments on standard benchmarks of CIFAR-100 and Tiny-ImageNet manifest that the proposed HKD well boost the effect of knowledge distillation tasks.
arXiv Detail & Related papers (2022-11-30T15:03:53Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - A Closer Look at Knowledge Distillation with Features, Logits, and
Gradients [81.39206923719455]
Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another.
This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources.
Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design.
arXiv Detail & Related papers (2022-03-18T21:26:55Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - Annealing Knowledge Distillation [5.396407687999048]
We propose an improved knowledge distillation method (called Annealing-KD) by feeding the rich information provided by the teacher's soft-targets incrementally and more efficiently.
This paper includes theoretical and empirical evidence as well as practical experiments to support the effectiveness of our Annealing-KD method.
arXiv Detail & Related papers (2021-04-14T23:45:03Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.