Peer Collaborative Learning for Online Knowledge Distillation
- URL: http://arxiv.org/abs/2006.04147v2
- Date: Wed, 3 Mar 2021 15:00:39 GMT
- Title: Peer Collaborative Learning for Online Knowledge Distillation
- Authors: Guile Wu and Shaogang Gong
- Abstract summary: Peer Collaborative Learning method integrates online ensembling and network collaboration into a unified framework.
Experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks.
- Score: 69.29602103582782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional knowledge distillation uses a two-stage training strategy to
transfer knowledge from a high-capacity teacher model to a compact student
model, which relies heavily on the pre-trained teacher. Recent online knowledge
distillation alleviates this limitation by collaborative learning, mutual
learning and online ensembling, following a one-stage end-to-end training
fashion. However, collaborative learning and mutual learning fail to construct
an online high-capacity teacher, whilst online ensembling ignores the
collaboration among branches and its logit summation impedes the further
optimisation of the ensemble teacher. In this work, we propose a novel Peer
Collaborative Learning method for online knowledge distillation, which
integrates online ensembling and network collaboration into a unified
framework. Specifically, given a target network, we construct a multi-branch
network for training, in which each branch is called a peer. We perform random
augmentation multiple times on the inputs to peers and assemble feature
representations outputted from peers with an additional classifier as the peer
ensemble teacher. This helps to transfer knowledge from a high-capacity teacher
to peers, and in turn further optimises the ensemble teacher. Meanwhile, we
employ the temporal mean model of each peer as the peer mean teacher to
collaboratively transfer knowledge among peers, which helps each peer to learn
richer knowledge and facilitates to optimise a more stable model with better
generalisation. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet show
that the proposed method significantly improves the generalisation of various
backbone networks and outperforms the state-of-the-art methods.
Related papers
- Decoupled Knowledge with Ensemble Learning for Online Distillation [3.794605440322862]
Online knowledge distillation is a one-stage strategy that alleviates the requirement with mutual learning and collaborative learning.
Recent peer collaborative learning (PCL) integrates online ensemble, collaboration of base networks and temporal mean teacher to construct effective knowledge.
A decoupled knowledge for online knowledge distillation is generated by an independent teacher, separate from the student.
arXiv Detail & Related papers (2023-12-18T14:08:59Z) - Heterogeneous-Branch Collaborative Learning for Dialogue Generation [11.124375734351826]
Collaborative learning is an effective way to conduct one-stage group distillation in the absence of a well-trained large teacher model.
Previous work has a severe branch homogeneity problem due to the same training objective and independent identical training sets.
We propose a dual group-based knowledge distillation method, consisting of positive distillation and negative distillation, to further diversify the features of different branches in a steadily and interpretable way.
arXiv Detail & Related papers (2023-03-21T06:41:50Z) - Collaborative Multi-Teacher Knowledge Distillation for Learning Low
Bit-width Deep Neural Networks [28.215073725175728]
We propose a novel framework that leverages both multi-teacher knowledge distillation and network quantization for learning low bit-width DNNs.
Our experimental results on CIFAR100 and ImageNet datasets show that the compact quantized student models trained with our method achieve competitive results.
arXiv Detail & Related papers (2022-10-27T01:03:39Z) - Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For
Model Compression [2.538209532048867]
Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge.
We propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance.
arXiv Detail & Related papers (2021-10-21T09:59:31Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z) - Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model
Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation.
We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge.
Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.