Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards
- URL: http://arxiv.org/abs/2406.13991v1
- Date: Thu, 20 Jun 2024 04:41:54 GMT
- Title: Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards
- Authors: Noah Topper, Alvaro Velasquez, George Atia,
- Abstract summary: Inverse reinforcement learning (IRL) is the problem of inferring a reward function from expert behavior.
A reward function might be non-Markovian, depending on more than just the current state, such as a reward machine (RM)
We propose a Bayesian IRL framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework.
- Score: 7.2933135237680595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse reinforcement learning (IRL) is the problem of inferring a reward function from expert behavior. There are several approaches to IRL, but most are designed to learn a Markovian reward. However, a reward function might be non-Markovian, depending on more than just the current state, such as a reward machine (RM). Although there has been recent work on inferring RMs, it assumes access to the reward signal, absent in IRL. We propose a Bayesian IRL (BIRL) framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework. We define a new reward space, adapt the expert demonstration to include history, show how to compute the reward posterior, and propose a novel modification to simulated annealing to maximize this posterior. We demonstrate that our method performs well when optimizing according to its inferred reward and compares favorably to an existing method that learns exclusively binary non-Markovian rewards.
Related papers
- Walking the Values in Bayesian Inverse Reinforcement Learning [66.68997022043075]
Key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood.
We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight.
arXiv Detail & Related papers (2024-07-15T17:59:52Z) - A Novel Variational Lower Bound for Inverse Reinforcement Learning [5.370126167091961]
Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories.
We present a new Variational Lower Bound for IRL (VLB-IRL)
Our method simultaneously learns the reward function and policy under the learned reward function.
arXiv Detail & Related papers (2023-11-07T03:50:43Z) - DreamSmooth: Improving Model-based Reinforcement Learning via Reward
Smoothing [60.21269454707625]
DreamSmooth learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep.
We show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks.
arXiv Detail & Related papers (2023-11-02T17:57:38Z) - STARC: A General Framework For Quantifying Differences Between Reward
Functions [55.33869271912095]
We provide a class of pseudometrics on the space of all reward functions that we call STARC metrics.
We show that STARC metrics induce both an upper and a lower bound on worst-case regret.
We also identify a number of issues with reward metrics proposed by earlier works.
arXiv Detail & Related papers (2023-09-26T20:31:19Z) - BC-IRL: Learning Generalizable Reward Functions from Demonstrations [51.535870379280155]
inverse reinforcement learning method learns reward functions that generalize better when compared to maximum-entropy IRL approaches.
We show that BC-IRL learns rewards that generalize better on an illustrative simple task and two continuous robotic control tasks, achieving over twice the success rate of baselines in challenging generalization settings.
arXiv Detail & Related papers (2023-03-28T17:57:20Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Relabelling for combined Reinforcement and Imitation Learning on
sparse-reward tasks [2.0305676256390934]
We present a new method to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm.
Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation.
Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation.
arXiv Detail & Related papers (2022-01-11T08:35:18Z) - Self-Supervised Online Reward Shaping in Sparse-Reward Environments [36.01839934355542]
We propose a novel reinforcement learning framework that performs self-supervised online reward shaping.
The proposed framework alternates between updating a policy and inferring a reward function.
Experimental results on several sparse-reward environments demonstrate that the proposed algorithm is significantly more sample efficient than the state-of-the-art baseline.
arXiv Detail & Related papers (2021-03-08T03:28:04Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.