BC-IRL: Learning Generalizable Reward Functions from Demonstrations
- URL: http://arxiv.org/abs/2303.16194v1
- Date: Tue, 28 Mar 2023 17:57:20 GMT
- Title: BC-IRL: Learning Generalizable Reward Functions from Demonstrations
- Authors: Andrew Szot, Amy Zhang, Dhruv Batra, Zsolt Kira, Franziska Meier
- Abstract summary: inverse reinforcement learning method learns reward functions that generalize better when compared to maximum-entropy IRL approaches.
We show that BC-IRL learns rewards that generalize better on an illustrative simple task and two continuous robotic control tasks, achieving over twice the success rate of baselines in challenging generalization settings.
- Score: 51.535870379280155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How well do reward functions learned with inverse reinforcement learning
(IRL) generalize? We illustrate that state-of-the-art IRL algorithms, which
maximize a maximum-entropy objective, learn rewards that overfit to the
demonstrations. Such rewards struggle to provide meaningful rewards for states
not covered by the demonstrations, a major detriment when using the reward to
learn policies in new situations. We introduce BC-IRL a new inverse
reinforcement learning method that learns reward functions that generalize
better when compared to maximum-entropy IRL approaches. In contrast to the
MaxEnt framework, which learns to maximize rewards around demonstrations,
BC-IRL updates reward parameters such that the policy trained with the new
reward matches the expert demonstrations better. We show that BC-IRL learns
rewards that generalize better on an illustrative simple task and two
continuous robotic control tasks, achieving over twice the success rate of
baselines in challenging generalization settings.
Related papers
- Walking the Values in Bayesian Inverse Reinforcement Learning [66.68997022043075]
Key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood.
We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight.
arXiv Detail & Related papers (2024-07-15T17:59:52Z) - Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards [7.2933135237680595]
Inverse reinforcement learning (IRL) is the problem of inferring a reward function from expert behavior.
A reward function might be non-Markovian, depending on more than just the current state, such as a reward machine (RM)
We propose a Bayesian IRL framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework.
arXiv Detail & Related papers (2024-06-20T04:41:54Z) - Go Beyond Imagination: Maximizing Episodic Reachability with World
Models [68.91647544080097]
In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination.
We apply learned world models to generate predicted future states with random actions.
Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks.
arXiv Detail & Related papers (2023-08-25T20:30:20Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Context-Hierarchy Inverse Reinforcement Learning [30.71220625227959]
inverse reinforcement learning (IRL) agent learns to act intelligently by observing expert demonstrations and learning the expert's underlying reward function.
We present Context Hierarchy IRL(CHIRL), a new IRL algorithm that exploits the context to scale up IRL and learn reward functions of complex behaviors.
Experiments on benchmark tasks, including a large scale autonomous driving task in the CARLA simulator, show promising results in scaling up IRL for tasks with complex reward functions.
arXiv Detail & Related papers (2022-02-25T10:29:05Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Regularized Inverse Reinforcement Learning [49.78352058771138]
Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior.
Regularized IRL applies strongly convex regularizers to the learner's policy.
We propose tractable solutions, and practical methods to obtain them, for regularized IRL.
arXiv Detail & Related papers (2020-10-07T23:38:47Z) - oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z) - Joint Goal and Strategy Inference across Heterogeneous Demonstrators via
Reward Network Distillation [1.1470070927586016]
inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations.
We propose a method to jointly infer a task goal and humans' strategic preferences via network distillation.
We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task.
arXiv Detail & Related papers (2020-01-02T16:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.