Adaptive Multi-Teacher Multi-level Knowledge Distillation
- URL: http://arxiv.org/abs/2103.04062v1
- Date: Sat, 6 Mar 2021 08:18:16 GMT
- Title: Adaptive Multi-Teacher Multi-level Knowledge Distillation
- Authors: Yuang Liu, Wei Zhang, Jun Wang
- Abstract summary: We propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework(AMTML-KD)
It consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights.
As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD.
- Score: 11.722728148523366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation~(KD) is an effective learning paradigm for improving
the performance of lightweight student networks by utilizing additional
supervision knowledge distilled from teacher networks. Most pioneering studies
either learn from only a single teacher in their distillation learning methods,
neglecting the potential that a student can learn from multiple teachers
simultaneously, or simply treat each teacher to be equally important, unable to
reveal the different importance of teachers for specific examples. To bridge
this gap, we propose a novel adaptive multi-teacher multi-level knowledge
distillation learning framework~(AMTML-KD), which consists two novel insights:
(i) associating each teacher with a latent representation to adaptively learn
instance-level teacher importance weights which are leveraged for acquiring
integrated soft-targets~(high-level knowledge) and (ii) enabling the
intermediate-level hints~(intermediate-level knowledge) to be gathered from
multiple teachers by the proposed multi-group hint strategy. As such, a student
model can learn multi-level knowledge from multiple teachers through AMTML-KD.
Extensive results on publicly available datasets demonstrate the proposed
learning framework ensures student to achieve improved performance than strong
competitors.
Related papers
- Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning [16.293262022872412]
We propose Adaptive Multi-teacher Knowledge Distillation with Meta-Learning (MMKD) to supervise student with appropriate knowledge from a tailored ensemble teacher.
With the help of a meta-weight network, the diverse yet compatible teacher knowledge in the output layer and intermediate layers is jointly leveraged to enhance the student performance.
arXiv Detail & Related papers (2023-06-11T09:38:45Z) - Collaborative Multi-Teacher Knowledge Distillation for Learning Low
Bit-width Deep Neural Networks [28.215073725175728]
We propose a novel framework that leverages both multi-teacher knowledge distillation and network quantization for learning low bit-width DNNs.
Our experimental results on CIFAR100 and ImageNet datasets show that the compact quantized student models trained with our method achieve competitive results.
arXiv Detail & Related papers (2022-10-27T01:03:39Z) - Automated Graph Self-supervised Learning via Multi-teacher Knowledge
Distillation [43.903582264697974]
This paper studies the problem of how to automatically, adaptively, and dynamically learn instance-level self-supervised learning strategies for each node.
We propose a novel multi-teacher knowledge distillation framework for Automated Graph Self-Supervised Learning (AGSSL)
Experiments on eight datasets show that AGSSL can benefit from multiple pretext tasks, outperforming the corresponding individual tasks.
arXiv Detail & Related papers (2022-10-05T08:39:13Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Confidence-Aware Multi-Teacher Knowledge Distillation [12.938478021855245]
Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD) is proposed.
It adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels.
Our CA-MKD consistently outperforms all compared state-of-the-art methods across various teacher-student architectures.
arXiv Detail & Related papers (2021-12-30T11:00:49Z) - Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For
Model Compression [2.538209532048867]
Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge.
We propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance.
arXiv Detail & Related papers (2021-10-21T09:59:31Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - Peer Collaborative Learning for Online Knowledge Distillation [69.29602103582782]
Peer Collaborative Learning method integrates online ensembling and network collaboration into a unified framework.
Experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks.
arXiv Detail & Related papers (2020-06-07T13:21:52Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.