Curriculum Design for Teaching via Demonstrations: Theory and
Applications
- URL: http://arxiv.org/abs/2106.04696v1
- Date: Tue, 8 Jun 2021 21:15:00 GMT
- Title: Curriculum Design for Teaching via Demonstrations: Theory and
Applications
- Authors: Gaurav Yengera, Rati Devidze, Parameswaran Kamalaruban, Adish Singla
- Abstract summary: We study how to design a personalized curriculum over demonstrations to speed up the learner's convergence.
We provide a unified curriculum strategy for two popular learner models: Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC)
- Score: 29.71112499480574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of teaching via demonstrations in sequential
decision-making settings. In particular, we study how to design a personalized
curriculum over demonstrations to speed up the learner's convergence. We
provide a unified curriculum strategy for two popular learner models: Maximum
Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy
Behavioral Cloning (CrossEnt-BC). Our unified strategy induces a ranking over
demonstrations based on a notion of difficulty scores computed w.r.t. the
teacher's optimal policy and the learner's current policy. Compared to the
state of the art, our strategy doesn't require access to the learner's internal
dynamics and still enjoys similar convergence guarantees under mild technical
conditions. Furthermore, we adapt our curriculum strategy to teach a learner
using domain knowledge in the form of task-specific difficulty scores when the
teacher's optimal policy is unknown. Experiments on a car driving simulator
environment and shortest path problems in a grid-world environment demonstrate
the effectiveness of our proposed curriculum strategy.
Related papers
- Learning to Steer Markovian Agents under Model Uncertainty [23.603487812521657]
We study how to design additional rewards to steer multi-agent systems towards desired policies.
Motivated by the limitation of existing works, we consider a new category of learning dynamics called emphMarkovian agents
We learn a emphhistory-dependent steering strategy to handle the inherent model uncertainty about the agents' learning dynamics.
arXiv Detail & Related papers (2024-07-14T14:01:38Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Interactively Teaching an Inverse Reinforcement Learner with Limited
Feedback [4.174296652683762]
We study the problem of teaching via demonstrations in sequential decision-making tasks.
In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this problem.
arXiv Detail & Related papers (2023-09-16T21:12:04Z) - Robust Learning from Observation with Model Misspecification [33.92371002674386]
Imitation learning (IL) is a popular paradigm for training policies in robotic systems.
We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
arXiv Detail & Related papers (2022-02-12T07:04:06Z) - RLTutor: Reinforcement Learning Based Adaptive Tutoring System by
Modeling Virtual Student with Fewer Interactions [10.34673089426247]
We propose a framework for optimizing teaching strategies by constructing a virtual model of the student.
Our results can serve as a buffer between theoretical instructional optimization and practical applications in e-learning systems.
arXiv Detail & Related papers (2021-07-31T15:42:03Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Distribution Matching for Machine Teaching [64.39292542263286]
Machine teaching is an inverse problem of machine learning that aims at steering the student learner towards its target hypothesis.
Previous studies on machine teaching focused on balancing the teaching risk and cost to find those best teaching examples.
This paper presents a distribution matching-based machine teaching strategy.
arXiv Detail & Related papers (2021-05-06T09:32:57Z) - The Sample Complexity of Teaching-by-Reinforcement on Q-Learning [40.37954633873304]
We study the sample complexity of teaching, termed as "teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm.
In this paper, we focus on a specific family of reinforcement learning algorithms, Q-learning, and characterize the TDim under different teachers with varying control power over the environment.
Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results.
arXiv Detail & Related papers (2020-06-16T17:06:04Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.