Learn to Teach: Improve Sample Efficiency in Teacher-student Learning
for Sim-to-Real Transfer
- URL: http://arxiv.org/abs/2402.06783v1
- Date: Fri, 9 Feb 2024 21:16:43 GMT
- Title: Learn to Teach: Improve Sample Efficiency in Teacher-student Learning
for Sim-to-Real Transfer
- Authors: Feiyang Wu, Zhaoyuan Gu, Ye Zhao, Anqi Wu
- Abstract summary: We propose a sample efficient learning framework termed Learn to Teach (L2T) that recycles experience collected by the teacher agent.
We show that a single-loop algorithm can train both the teacher and student agents under both Reinforcement Learning and Inverse Reinforcement Learning contexts.
- Score: 5.731477362725785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Simulation-to-reality (sim-to-real) transfer is a fundamental problem for
robot learning. Domain Randomization, which adds randomization during training,
is a powerful technique that effectively addresses the sim-to-real gap.
However, the noise in observations makes learning significantly harder.
Recently, studies have shown that employing a teacher-student learning paradigm
can accelerate training in randomized environments. Learned with privileged
information, a teacher agent can instruct the student agent to operate in noisy
environments. However, this approach is often not sample efficient as the
experience collected by the teacher is discarded completely when training the
student, wasting information revealed by the environment. In this work, we
extend the teacher-student learning paradigm by proposing a sample efficient
learning framework termed Learn to Teach (L2T) that recycles experience
collected by the teacher agent. We observe that the dynamics of the
environments for both agents remain unchanged, and the state space of the
teacher is coupled with the observation space of the student. We show that a
single-loop algorithm can train both the teacher and student agents under both
Reinforcement Learning and Inverse Reinforcement Learning contexts. We
implement variants of our methods, conduct experiments on the MuJoCo benchmark,
and apply our methods to the Cassie robot locomotion problem. Extensive
experiments show that our method achieves competitive performance while only
requiring environmental interaction with the teacher.
Related papers
- Adaptive Teaching in Heterogeneous Agents: Balancing Surprise in Sparse Reward Scenarios [3.638198517970729]
Learning from Demonstration can be an efficient way to train systems with analogous agents.
However, naively replicating demonstrations that are out of bounds for the Student's capability can limit efficient learning.
We present a Teacher-Student learning framework specifically tailored to address the challenge of heterogeneity between the Teacher and Student agents.
arXiv Detail & Related papers (2024-05-23T05:52:42Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Teacher-student curriculum learning for reinforcement learning [1.7259824817932292]
Reinforcement learning (rl) is a popular paradigm for sequential decision making problems.
The sample inefficiency of deep reinforcement learning methods is a significant obstacle when applying rl to real-world problems.
We propose a teacher-student curriculum learning setting where we simultaneously train a teacher that selects tasks for the student while the student learns how to solve the selected task.
arXiv Detail & Related papers (2022-10-31T14:45:39Z) - Know Thy Student: Interactive Learning with Gaussian Processes [11.641731210416102]
Our work proposes a simple diagnosis algorithm which uses Gaussian processes for inferring student-related information, before constructing a teaching dataset.
We study this in the offline reinforcement learning setting where the teacher must provide demonstrations to the student and avoid sending redundant trajectories.
Our experiments highlight the importance of diagosing before teaching and demonstrate how students can learn more efficiently with the help of an interactive teacher.
arXiv Detail & Related papers (2022-04-26T04:43:57Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Interaction-limited Inverse Reinforcement Learning [50.201765937436654]
We present two different training strategies: Curriculum Inverse Reinforcement Learning (CIRL) covering the teacher's perspective, and Self-Paced Inverse Reinforcement Learning (SPIRL) focusing on the learner's perspective.
Using experiments in simulations and experiments with a real robot learning a task from a human demonstrator, we show that our training strategies can allow a faster training than a random teacher for CIRL and than a batch learner for SPIRL.
arXiv Detail & Related papers (2020-07-01T12:31:52Z) - Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic
Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning.
This adaptation uses less than 0.2% of the data necessary to learn the task from scratch.
We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.