Student-Informed Teacher Training
- URL: http://arxiv.org/abs/2412.09149v1
- Date: Thu, 12 Dec 2024 10:34:26 GMT
- Title: Student-Informed Teacher Training
- Authors: Nico Messikommer, Jiaxu Xing, Elie Aljalbout, Davide Scaramuzza,
- Abstract summary: Imitation learning with a privileged teacher has proven effective for learning complex control behaviors from high-dimensional inputs, such as images.
In this framework, a teacher is trained with privileged task information, while a student tries to predict the actions of the teacher with more limited observations.
We propose a framework for joint training of the teacher and student policies, encouraging the teacher to learn behaviors that can be imitated by the student.
- Score: 19.895253502371588
- License:
- Abstract: Imitation learning with a privileged teacher has proven effective for learning complex control behaviors from high-dimensional inputs, such as images. In this framework, a teacher is trained with privileged task information, while a student tries to predict the actions of the teacher with more limited observations, e.g., in a robot navigation task, the teacher might have access to distances to nearby obstacles, while the student only receives visual observations of the scene. However, privileged imitation learning faces a key challenge: the student might be unable to imitate the teacher's behavior due to partial observability. This problem arises because the teacher is trained without considering if the student is capable of imitating the learned behavior. To address this teacher-student asymmetry, we propose a framework for joint training of the teacher and student policies, encouraging the teacher to learn behaviors that can be imitated by the student despite the latters' limited access to information and its partial observability. Based on the performance bound in imitation learning, we add (i) the approximated action difference between teacher and student as a penalty term to the reward function of the teacher, and (ii) a supervised teacher-student alignment step. We motivate our method with a maze navigation task and demonstrate its effectiveness on complex vision-based quadrotor flight and manipulation tasks.
Related papers
- Representational Alignment Supports Effective Machine Teaching [81.19197059407121]
GRADE is a new controlled experimental setting to study pedagogy and representational alignment.
We find that improved representational alignment with a student improves student learning outcomes.
However, this effect is moderated by the size and representational diversity of the class being taught.
arXiv Detail & Related papers (2024-06-06T17:48:24Z) - Adaptive Teaching in Heterogeneous Agents: Balancing Surprise in Sparse Reward Scenarios [3.638198517970729]
Learning from Demonstration can be an efficient way to train systems with analogous agents.
However, naively replicating demonstrations that are out of bounds for the Student's capability can limit efficient learning.
We present a Teacher-Student learning framework specifically tailored to address the challenge of heterogeneity between the Teacher and Student agents.
arXiv Detail & Related papers (2024-05-23T05:52:42Z) - Can Language Models Teach Weaker Agents? Teacher Explanations Improve
Students via Personalization [84.86241161706911]
We show that teacher LLMs can indeed intervene on student reasoning to improve their performance.
We also demonstrate that in multi-turn interactions, teacher explanations generalize and learn from explained data.
We verify that misaligned teachers can lower student performance to random chance by intentionally misleading them.
arXiv Detail & Related papers (2023-06-15T17:27:20Z) - Sparse Teachers Can Be Dense with Knowledge [35.83646432932867]
We propose a sparse teacher trick under the guidance of an overall knowledgable score for each teacher parameter.
The aim is to ensure that the expressive parameters are retained while the student-unfriendly ones are removed.
Experiments on the GLUE benchmark show that the proposed sparse teachers can be dense with knowledge and lead to students with compelling performance.
arXiv Detail & Related papers (2022-10-08T05:25:34Z) - Know Thy Student: Interactive Learning with Gaussian Processes [11.641731210416102]
Our work proposes a simple diagnosis algorithm which uses Gaussian processes for inferring student-related information, before constructing a teaching dataset.
We study this in the offline reinforcement learning setting where the teacher must provide demonstrations to the student and avoid sending redundant trajectories.
Our experiments highlight the importance of diagosing before teaching and demonstrate how students can learn more efficiently with the help of an interactive teacher.
arXiv Detail & Related papers (2022-04-26T04:43:57Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Self-Training with Differentiable Teacher [80.62757989797095]
Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks.
The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions.
We propose ours, short for differentiable self-training, that treats teacher-student as a Stackelberg game.
arXiv Detail & Related papers (2021-09-15T02:06:13Z) - Learning to Teach with Student Feedback [67.41261090761834]
Interactive Knowledge Distillation (IKD) allows the teacher to learn to teach from the feedback of the student.
IKD trains the teacher model to generate specific soft target at each training step for a certain student.
Joint optimization for both teacher and student is achieved by two iterative steps.
arXiv Detail & Related papers (2021-09-10T03:01:01Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.