Related papers: Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback

Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback

URL: http://arxiv.org/abs/2309.09095v1
Date: Sat, 16 Sep 2023 21:12:04 GMT
Title: Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback
Authors: Rustam Zayanov, Francisco S. Melo, Manuel Lopes
Abstract summary: We study the problem of teaching via demonstrations in sequential decision-making tasks. In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this problem.
Score: 4.174296652683762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of teaching via demonstrations in sequential decision-making tasks. In particular, we focus on the situation when the teacher has no access to the learner's model and policy, and the feedback from the learner is limited to trajectories that start from states selected by the teacher. The necessity to select the starting states and infer the learner's policy creates an opportunity for using the methods of inverse reinforcement learning and active learning by the teacher. In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this teaching problem. The algorithm uses a modified version of the active value-at-risk method to select the starting states, a modified maximum causal entropy algorithm to infer the policy, and the difficulty score ratio method to choose the teaching demonstrations. We test the algorithm in a synthetic car driving environment and conclude that the proposed algorithm is an effective solution when the learner's feedback is limited.

Related papers

Knowledge Distillation with Training Wheels [15.153745235245287]
We formulate a more general framework for knowledge distillation where the student learns from the teacher during training. We extend this using constrained reinforcement learning to a framework that incorporates the use of the teacher model as a test-time reference.
arXiv Detail & Related papers (2025-02-24T23:17:52Z)
Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
How to Choose a Reinforcement-Learning Algorithm [29.76033485145459]
We streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods.
arXiv Detail & Related papers (2024-07-30T15:54:18Z)
Closed-loop Teaching via Demonstrations to Improve Policy Transparency [2.5515055736875016]
This paper explores augmenting a curriculum with a closed-loop teaching framework inspired by principles from the education literature. A user study finds that our proposed closed-loop teaching framework reduces the regret in human test responses by 43% over a baseline.
arXiv Detail & Related papers (2024-04-01T14:59:26Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Reusable Options through Gradient-based Meta Learning [24.59017394648942]
Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner. We frame the problem of learning options as a gradient-based meta-learning problem. We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
arXiv Detail & Related papers (2022-12-22T14:19:35Z)
Curriculum Design for Teaching via Demonstrations: Theory and Applications [29.71112499480574]
We study how to design a personalized curriculum over demonstrations to speed up the learner's convergence. We provide a unified curriculum strategy for two popular learner models: Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC)
arXiv Detail & Related papers (2021-06-08T21:15:00Z)
Distribution Matching for Machine Teaching [64.39292542263286]
Machine teaching is an inverse problem of machine learning that aims at steering the student learner towards its target hypothesis. Previous studies on machine teaching focused on balancing the teaching risk and cost to find those best teaching examples. This paper presents a distribution matching-based machine teaching strategy.
arXiv Detail & Related papers (2021-05-06T09:32:57Z)
Mastering Rate based Curriculum Learning [78.45222238426246]
We argue that the notion of learning progress itself has several shortcomings that lead to a low sample efficiency for the learner. We propose a new algorithm, based on the notion of mastering rate, that significantly outperforms learning progress-based algorithms.
arXiv Detail & Related papers (2020-08-14T16:34:01Z)
Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
Active Imitation Learning from Multiple Non-Deterministic Teachers: Formulation, Challenges, and Algorithms [3.6702509833426613]
We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost. We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies. Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost.
arXiv Detail & Related papers (2020-06-14T03:06:27Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP) Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations. We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.