Interactively Teaching an Inverse Reinforcement Learner with Limited
Feedback
- URL: http://arxiv.org/abs/2309.09095v1
- Date: Sat, 16 Sep 2023 21:12:04 GMT
- Title: Interactively Teaching an Inverse Reinforcement Learner with Limited
Feedback
- Authors: Rustam Zayanov, Francisco S. Melo, Manuel Lopes
- Abstract summary: We study the problem of teaching via demonstrations in sequential decision-making tasks.
In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this problem.
- Score: 4.174296652683762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of teaching via demonstrations in sequential
decision-making tasks. In particular, we focus on the situation when the
teacher has no access to the learner's model and policy, and the feedback from
the learner is limited to trajectories that start from states selected by the
teacher. The necessity to select the starting states and infer the learner's
policy creates an opportunity for using the methods of inverse reinforcement
learning and active learning by the teacher. In this work, we formalize the
teaching process with limited feedback and propose an algorithm that solves
this teaching problem. The algorithm uses a modified version of the active
value-at-risk method to select the starting states, a modified maximum causal
entropy algorithm to infer the policy, and the difficulty score ratio method to
choose the teaching demonstrations. We test the algorithm in a synthetic car
driving environment and conclude that the proposed algorithm is an effective
solution when the learner's feedback is limited.
Related papers
- How to Choose a Reinforcement-Learning Algorithm [29.76033485145459]
We streamline the process of choosing reinforcement-learning algorithms and action-distribution families.
We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods.
arXiv Detail & Related papers (2024-07-30T15:54:18Z) - Closed-loop Teaching via Demonstrations to Improve Policy Transparency [2.5515055736875016]
This paper explores augmenting a curriculum with a closed-loop teaching framework inspired by principles from the education literature.
A user study finds that our proposed closed-loop teaching framework reduces the regret in human test responses by 43% over a baseline.
arXiv Detail & Related papers (2024-04-01T14:59:26Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Reusable Options through Gradient-based Meta Learning [24.59017394648942]
Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner.
We frame the problem of learning options as a gradient-based meta-learning problem.
We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
arXiv Detail & Related papers (2022-12-22T14:19:35Z) - Curriculum Design for Teaching via Demonstrations: Theory and
Applications [29.71112499480574]
We study how to design a personalized curriculum over demonstrations to speed up the learner's convergence.
We provide a unified curriculum strategy for two popular learner models: Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC)
arXiv Detail & Related papers (2021-06-08T21:15:00Z) - Distribution Matching for Machine Teaching [64.39292542263286]
Machine teaching is an inverse problem of machine learning that aims at steering the student learner towards its target hypothesis.
Previous studies on machine teaching focused on balancing the teaching risk and cost to find those best teaching examples.
This paper presents a distribution matching-based machine teaching strategy.
arXiv Detail & Related papers (2021-05-06T09:32:57Z) - Mastering Rate based Curriculum Learning [78.45222238426246]
We argue that the notion of learning progress itself has several shortcomings that lead to a low sample efficiency for the learner.
We propose a new algorithm, based on the notion of mastering rate, that significantly outperforms learning progress-based algorithms.
arXiv Detail & Related papers (2020-08-14T16:34:01Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Active Imitation Learning from Multiple Non-Deterministic Teachers:
Formulation, Challenges, and Algorithms [3.6702509833426613]
We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost.
We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies.
Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost.
arXiv Detail & Related papers (2020-06-14T03:06:27Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z) - Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.