Related papers: Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

URL: http://arxiv.org/abs/2306.06634v1
Date: Sun, 11 Jun 2023 09:38:45 GMT
Title: Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning
Authors: Hailin Zhang, Defang Chen, Can Wang
Abstract summary: We propose Adaptive Multi-teacher Knowledge Distillation with Meta-Learning (MMKD) to supervise student with appropriate knowledge from a tailored ensemble teacher. With the help of a meta-weight network, the diverse yet compatible teacher knowledge in the output layer and intermediate layers is jointly leveraged to enhance the student performance.
Score: 16.293262022872412
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Teacher knowledge distillation provides students with additional supervision from multiple pre-trained teachers with diverse information sources. Most existing methods explore different weighting strategies to obtain a powerful ensemble teacher, while ignoring the student with poor learning ability may not benefit from such specialized integrated knowledge. To address this problem, we propose Adaptive Multi-teacher Knowledge Distillation with Meta-Learning (MMKD) to supervise student with appropriate knowledge from a tailored ensemble teacher. With the help of a meta-weight network, the diverse yet compatible teacher knowledge in the output layer and intermediate layers is jointly leveraged to enhance the student performance. Extensive experiments on multiple benchmark datasets validate the effectiveness and flexibility of our methods. Code is available: https://github.com/Rorozhl/MMKD.

Related papers

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition [24.293448609592147]
Multi-Teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights.
arXiv Detail & Related papers (2025-02-22T09:31:24Z)
Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks [28.215073725175728]
We propose a novel framework that leverages both multi-teacher knowledge distillation and network quantization for learning low bit-width DNNs. Our experimental results on CIFAR100 and ImageNet datasets show that the compact quantized student models trained with our method achieve competitive results.
arXiv Detail & Related papers (2022-10-27T01:03:39Z)
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
Confidence-Aware Multi-Teacher Knowledge Distillation [12.938478021855245]
Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD) is proposed. It adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels. Our CA-MKD consistently outperforms all compared state-of-the-art methods across various teacher-student architectures.
arXiv Detail & Related papers (2021-12-30T11:00:49Z)
Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression [2.538209532048867]
Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge. We propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance.
arXiv Detail & Related papers (2021-10-21T09:59:31Z)
Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency. We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z)
Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student. Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z)
Adaptive Multi-Teacher Multi-level Knowledge Distillation [11.722728148523366]
We propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework(AMTML-KD) It consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD.
arXiv Detail & Related papers (2021-03-06T08:18:16Z)
Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT) It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way. The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z)
Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation. In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation. Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z)
Peer Collaborative Learning for Online Knowledge Distillation [69.29602103582782]
Peer Collaborative Learning method integrates online ensembling and network collaboration into a unified framework. Experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks.
arXiv Detail & Related papers (2020-06-07T13:21:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.