Reinforcement Teaching
- URL: http://arxiv.org/abs/2204.11897v1
- Date: Mon, 25 Apr 2022 18:04:17 GMT
- Title: Reinforcement Teaching
- Authors: Alex Lewandowski, Calarina Muslimani, Matthew E. Taylor, Jun Luo, Dale
Schuurmans
- Abstract summary: We propose Reinforcement Teaching: a framework for meta-learning in which a teaching policy is learned, through reinforcement, to control a student's learning process.
The student's learning process is modelled as a Markov reward process and the teacher, with its action-space, interacts with the induced Markov decision process.
We show that, for many learning processes, the student's learnable parameters form a Markov state. To avoid having the teacher learn directly from parameters, we propose the Embedder that learns a representation of a student's state from its input/output behaviour.
- Score: 43.80089037901853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Reinforcement Teaching: a framework for meta-learning in which a
teaching policy is learned, through reinforcement, to control a student's
learning process. The student's learning process is modelled as a Markov reward
process and the teacher, with its action-space, interacts with the induced
Markov decision process. We show that, for many learning processes, the
student's learnable parameters form a Markov state. To avoid having the teacher
learn directly from parameters, we propose the Parameter Embedder that learns a
representation of a student's state from its input/output behaviour. Next, we
use learning progress to shape the teacher's reward towards maximizing the
student's performance. To demonstrate the generality of Reinforcement Teaching,
we conducted experiments in which a teacher learns to significantly improve
supervised and reinforcement learners by using a combination of learning
progress reward and a Parameter Embedded state. These results show that
Reinforcement Teaching is not only an expressive framework capable of unifying
different approaches, but also provides meta-learning with the plethora of
tools from reinforcement learning.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Meta Learning for Knowledge Distillation [12.716258111815312]
We show the teacher network can learn to better transfer knowledge to the student network.
We introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner.
arXiv Detail & Related papers (2021-06-08T17:59:03Z) - Learning by Teaching, with Application to Neural Architecture Search [10.426533624387305]
We propose a novel ML framework referred to as learning by teaching (LBT)
In LBT, a teacher model improves itself by teaching a student model to learn well.
Based on how the student performs on a validation dataset, the teacher re-learns its model and re-teaches the student until the student achieves great validation performance.
arXiv Detail & Related papers (2021-03-11T23:50:38Z) - Teaching to Learn: Sequential Teaching of Agents with Inner States [20.556373950863247]
We introduce a multi-agent formulation in which learners' inner state may change with the teaching interaction.
In order to teach such learners, we propose an optimal control approach that takes the future performance of the learner after teaching into account.
arXiv Detail & Related papers (2020-09-14T07:03:15Z) - Mastering Rate based Curriculum Learning [78.45222238426246]
We argue that the notion of learning progress itself has several shortcomings that lead to a low sample efficiency for the learner.
We propose a new algorithm, based on the notion of mastering rate, that significantly outperforms learning progress-based algorithms.
arXiv Detail & Related papers (2020-08-14T16:34:01Z) - Interaction-limited Inverse Reinforcement Learning [50.201765937436654]
We present two different training strategies: Curriculum Inverse Reinforcement Learning (CIRL) covering the teacher's perspective, and Self-Paced Inverse Reinforcement Learning (SPIRL) focusing on the learner's perspective.
Using experiments in simulations and experiments with a real robot learning a task from a human demonstrator, we show that our training strategies can allow a faster training than a random teacher for CIRL and than a batch learner for SPIRL.
arXiv Detail & Related papers (2020-07-01T12:31:52Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.