Student-Teacher Learning from Clean Inputs to Noisy Inputs
- URL: http://arxiv.org/abs/2103.07600v1
- Date: Sat, 13 Mar 2021 02:29:35 GMT
- Title: Student-Teacher Learning from Clean Inputs to Noisy Inputs
- Authors: Guanzhe Hong, Zhiyuan Mao, Xiaojun Lin, Stanley H. Chan
- Abstract summary: Feature-based student-teacher learning is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network.
We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks.
- Score: 20.428469418957544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature-based student-teacher learning, a training method that encourages the
student's hidden features to mimic those of the teacher network, is empirically
successful in transferring the knowledge from a pre-trained teacher network to
the student network. Furthermore, recent empirical results demonstrate that,
the teacher's features can boost the student network's generalization even when
the student's input sample is corrupted by noise. However, there is a lack of
theoretical insights into why and when this method of transferring knowledge
can be successful between such heterogeneous tasks. We analyze this method
theoretically using deep linear networks, and experimentally using nonlinear
networks. We identify three vital factors to the success of the method: (1)
whether the student is trained to zero training loss; (2) how knowledgeable the
teacher is on the clean-input problem; (3) how the teacher decomposes its
knowledge in its hidden features. Lack of proper control in any of the three
factors leads to failure of the student-teacher learning method.
Related papers
- Random Teachers are Good Teachers [19.74244993871716]
We investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.
When distilling a student into such a random teacher, we observe a strong improvement of the distilled student over its teacher in terms of probing accuracy.
arXiv Detail & Related papers (2023-02-23T15:26:08Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Does Knowledge Distillation Really Work? [106.38447017262183]
We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood.
We identify difficulties in optimization as a key reason for why the student is unable to match the teacher.
arXiv Detail & Related papers (2021-06-10T17:44:02Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge.
Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly.
In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z) - Student-Teacher Curriculum Learning via Reinforcement Learning:
Predicting Hospital Inpatient Admission Location [4.359338565775979]
In this work we propose a student-teacher network via reinforcement learning to deal with this specific problem.
A representation of the weights of the student network is treated as the state and is fed as an input to the teacher network.
The teacher network's action is to select the most appropriate batch of data to train the student network on from a training set sorted according to entropy.
arXiv Detail & Related papers (2020-07-01T15:00:43Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z) - Understanding the Power and Limitations of Teaching with Imperfect
Knowledge [30.588367257209388]
We study the interaction between a teacher and a student/learner where the teacher selects training examples for the learner to learn a specific task.
Inspired by real-world applications of machine teaching in education, we consider the setting where teacher's knowledge is limited and noisy.
We show connections to how imperfect knowledge affects the teacher's solution of the corresponding machine teaching problem when constructing optimal teaching sets.
arXiv Detail & Related papers (2020-03-21T17:53:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.