ORC: Network Group-based Knowledge Distillation using Online Role Change
- URL: http://arxiv.org/abs/2206.01186v2
- Date: Tue, 8 Aug 2023 08:51:45 GMT
- Title: ORC: Network Group-based Knowledge Distillation using Online Role Change
- Authors: Junyong Choi, Hyeon Cho, Seokhwa Cheung, Wonjun Hwang
- Abstract summary: We propose an online role change strategy for multiple teacher-based knowledge distillations.
The top-ranked networks in the student group are able to promote to the teacher group at every iteration.
We verify the superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet.
- Score: 3.735965959270874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In knowledge distillation, since a single, omnipotent teacher network cannot
solve all problems, multiple teacher-based knowledge distillations have been
studied recently. However, sometimes their improvements are not as good as
expected because some immature teachers may transfer the false knowledge to the
student. In this paper, to overcome this limitation and take the efficacy of
the multiple networks, we divide the multiple networks into teacher and student
groups, respectively. That is, the student group is a set of immature networks
that require learning the teacher's knowledge, while the teacher group consists
of the selected networks that are capable of teaching successfully. We propose
our online role change strategy where the top-ranked networks in the student
group are able to promote to the teacher group at every iteration. After
training the teacher group using the error samples of the student group to
refine the teacher group's knowledge, we transfer the collaborative knowledge
from the teacher group to the student group successfully. We verify the
superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet which
achieves high performance. We further show the generality of our method with
various backbone architectures such as ResNet, WRN, VGG, Mobilenet, and
Shufflenet.
Related papers
- Adaptive Teaching with Shared Classifier for Knowledge Distillation [6.03477652126575]
Knowledge distillation (KD) is a technique used to transfer knowledge from a teacher network to a student network.
We propose adaptive teaching with a shared classifier (ATSC)
Our approach achieves state-of-the-art results on the CIFAR-100 and ImageNet datasets in both single-teacher and multiteacher scenarios.
arXiv Detail & Related papers (2024-06-12T08:51:08Z) - Student Helping Teacher: Teacher Evolution via Self-Knowledge
Distillation [20.17325172100031]
We propose a novel student-helping-teacher formula, Teacher Evolution via Self-Knowledge Distillation (TESKD), where the target teacher is learned with the help of multiple hierarchical students by sharing the structural backbone.
The effectiveness of our proposed framework is demonstrated by extensive experiments with various network settings on two standard benchmarks including CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2021-10-01T11:46:12Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Does Knowledge Distillation Really Work? [106.38447017262183]
We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood.
We identify difficulties in optimization as a key reason for why the student is unable to match the teacher.
arXiv Detail & Related papers (2021-06-10T17:44:02Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Densely Guided Knowledge Distillation using Multiple Teacher Assistants [5.169724825219126]
We propose a densely guided knowledge distillation using multiple teacher assistants that gradually decreases the model size.
We also design teaching where, for each mini-batch, a teacher or teacher assistants are randomly dropped.
This acts as a regularizer to improve the efficiency of teaching of the student network.
arXiv Detail & Related papers (2020-09-18T13:12:52Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z) - Peer Collaborative Learning for Online Knowledge Distillation [69.29602103582782]
Peer Collaborative Learning method integrates online ensembling and network collaboration into a unified framework.
Experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks.
arXiv Detail & Related papers (2020-06-07T13:21:52Z) - Teacher-Class Network: A Neural Network Compression Mechanism [2.257416403770908]
Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge to each student.
Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network.
The proposed teacher-class architecture is evaluated on several benchmark datasets such as MNIST, Fashion MNIST, IMDB Movie Reviews, CAMVid, CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-04-07T11:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.