Cooperative Knowledge Distillation: A Learner Agnostic Approach
- URL: http://arxiv.org/abs/2402.05942v1
- Date: Fri, 2 Feb 2024 17:31:50 GMT
- Title: Cooperative Knowledge Distillation: A Learner Agnostic Approach
- Authors: Michael Livanos, Ian Davidson, Stephen Wong
- Abstract summary: We formulate a novel form of knowledge distillation in which many models can act as both students and teachers.
Because different models may have different strengths and weaknesses, all models can act as either students or teachers.
- Score: 15.414204257189596
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Knowledge distillation is a simple but powerful way to transfer knowledge
between a teacher model to a student model. Existing work suffers from at least
one of the following key limitations in terms of direction and scope of
transfer which restrict its use: all knowledge is transferred from teacher to
student regardless of whether or not that knowledge is useful, the student is
the only one learning in this exchange, and typically distillation transfers
knowledge only from a single teacher to a single student. We formulate a novel
form of knowledge distillation in which many models can act as both students
and teachers which we call cooperative distillation. The models cooperate as
follows: a model (the student) identifies specific deficiencies in it's
performance and searches for another model (the teacher) who encodes learned
knowledge into instructional virtual instances via counterfactual instance
generation. Because different models may have different strengths and
weaknesses, all models can act as either students or teachers (cooperation)
when appropriate and only distill knowledge in areas specific to their
strengths (focus). Since counterfactuals as a paradigm are not tied to any
specific algorithm, we can use this method to distill knowledge between
learners of different architectures, algorithms, and even feature spaces. We
demonstrate that our approach not only outperforms baselines such as transfer
learning, self-supervised learning, and multiple knowledge distillation
algorithms on several datasets, but it can also be used in settings where the
aforementioned techniques cannot.
Related papers
- Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Extracting knowledge from features with multilevel abstraction [3.4443503349903124]
Self-knowledge distillation (SKD) aims at transferring the knowledge from a large teacher model to a small student model.
In this paper, we purpose a novel SKD method in a different way from the main stream methods.
Experiments and ablation studies show its great effectiveness and generalization on various kinds of tasks.
arXiv Detail & Related papers (2021-12-04T02:25:46Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - Multi-level Knowledge Distillation [13.71183256776644]
We introduce Multi-level Knowledge Distillation (MLKD) to transfer richer representational knowledge from teacher to student networks.
MLKD employs three novel teacher-student similarities: individual similarity, relational similarity, and categorical similarity.
Experiments demonstrate that MLKD outperforms other state-of-the-art methods on both similar-architecture and cross-architecture tasks.
arXiv Detail & Related papers (2020-12-01T15:27:15Z) - Role-Wise Data Augmentation for Knowledge Distillation [48.115719640111394]
Knowledge Distillation (KD) is a common method for transferring the knowledge'' learned by one machine learning model into another.
We design data augmentation agents with distinct roles to facilitate knowledge distillation.
We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student.
arXiv Detail & Related papers (2020-04-19T14:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.