Kernel Density Bayesian Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2303.06827v2
- Date: Thu, 12 Oct 2023 23:04:25 GMT
- Title: Kernel Density Bayesian Inverse Reinforcement Learning
- Authors: Aishwarya Mandyam, Didong Li, Diana Cai, Andrew Jones, Barbara E.
Engelhardt
- Abstract summary: Inverse reinforcement learning(IRL) is a powerful framework to infer an agent's reward function by observing its behavior.
We introduce kernel density Bayesian IRL, which uses conditional kernel density estimation to directly approximate the likelihood.
- Score: 6.03878742235195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse reinforcement learning~(IRL) is a powerful framework to infer an
agent's reward function by observing its behavior, but IRL algorithms that
learn point estimates of the reward function can be misleading because there
may be several functions that describe an agent's behavior equally well. A
Bayesian approach to IRL models a distribution over candidate reward functions,
alleviating the shortcomings of learning a point estimate. However, several
Bayesian IRL algorithms use a $Q$-value function in place of the likelihood
function. The resulting posterior is computationally intensive to calculate,
has few theoretical guarantees, and the $Q$-value function is often a poor
approximation for the likelihood. We introduce kernel density Bayesian IRL
(KD-BIRL), which uses conditional kernel density estimation to directly
approximate the likelihood, providing an efficient framework that, with a
modified reward function parameterization, is applicable to environments with
complex and infinite state spaces. We demonstrate KD-BIRL's benefits through a
series of experiments in Gridworld environments and a simulated sepsis
treatment task.
Related papers
- BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Is Inverse Reinforcement Learning Harder than Standard Reinforcement
Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems.
This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime.
As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Learning Representations on the Unit Sphere: Investigating Angular
Gaussian and von Mises-Fisher Distributions for Online Continual Learning [7.145581090959242]
We propose a memory-based representation learning technique equipped with our new loss functions.
We demonstrate that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries.
arXiv Detail & Related papers (2023-06-06T02:38:01Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Probability Density Estimation Based Imitation Learning [11.262633728487165]
Imitation Learning (IL) is an effective learning paradigm exploiting the interactions between agents and environments.
In this work, a novel reward function based on probability density estimation is proposed for IRL.
We present a "watch-try-learn" style framework named Probability Density Estimation based Imitation Learning (PDEIL)
arXiv Detail & Related papers (2021-12-13T15:55:38Z) - Efficient Exploration of Reward Functions in Inverse Reinforcement
Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration.
This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.