Understanding Reward Ambiguity Through Optimal Transport Theory in
Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2310.12055v1
- Date: Wed, 18 Oct 2023 15:42:53 GMT
- Title: Understanding Reward Ambiguity Through Optimal Transport Theory in
Inverse Reinforcement Learning
- Authors: Ali Baheri
- Abstract summary: inverse reinforcement learning (IRL) aims to infer underlying reward functions from observed expert behaviors.
Current methods often face challenges with high-dimensional problems and lack a geometric foundation.
This paper harnesses the optimal transport (OT) theory to provide a fresh perspective on these challenges.
- Score: 4.8951183832371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In inverse reinforcement learning (IRL), the central objective is to infer
underlying reward functions from observed expert behaviors in a way that not
only explains the given data but also generalizes to unseen scenarios. This
ensures robustness against reward ambiguity where multiple reward functions can
equally explain the same expert behaviors. While significant efforts have been
made in addressing this issue, current methods often face challenges with
high-dimensional problems and lack a geometric foundation. This paper harnesses
the optimal transport (OT) theory to provide a fresh perspective on these
challenges. By utilizing the Wasserstein distance from OT, we establish a
geometric framework that allows for quantifying reward ambiguity and
identifying a central representation or centroid of reward functions. These
insights pave the way for robust IRL methodologies anchored in geometric
interpretations, offering a structured approach to tackle reward ambiguity in
high-dimensional settings.
Related papers
- An Information-Geometric Approach to Artificial Curiosity [49.1574468325115]
We show that intrinsic rewards should depend on the agent's information about the environment, remaining to the representation of the information.
We show that invariance under congruent Markov morphisms and the agent-environment interaction, uniquely constrains intrinsic rewards to concave functions of the reciprocal occupancy.
This framework provides important constraints to the engineering of intrinsic reward while integrating foundational exploration methods into a single, cohesive model.
arXiv Detail & Related papers (2025-04-08T18:04:15Z) - Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework.
It automatically generalizes the task given into smaller, more manageable subtasks.
It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z) - Guarantees for Epsilon-Greedy Reinforcement Learning with Function
Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks.
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z) - Learning Long-Term Reward Redistribution via Randomized Return
Decomposition [18.47810850195995]
We consider the problem formulation of episodic reinforcement learning with trajectory feedback.
It refers to an extreme delay of reward signals, in which the agent can only obtain one reward signal at the end of each trajectory.
We propose a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning.
arXiv Detail & Related papers (2021-11-26T13:23:36Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Scalable Bayesian Inverse Reinforcement Learning [93.27920030279586]
We introduce Approximate Variational Reward Imitation Learning (AVRIL)
Our method addresses the ill-posed nature of the inverse reinforcement learning problem.
Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods.
arXiv Detail & Related papers (2021-02-12T12:32:02Z) - Geometric Entropic Exploration [52.67987687712534]
We introduce a new algorithm that maximises the geometry-aware Shannon entropy of state-visits in both discrete and continuous domains.
Our key theoretical contribution is casting geometry-aware MSVE exploration as a tractable problem of optimising a simple and novel noise-contrastive objective function.
In our experiments, we show the efficiency of GEM in solving several RL problems with sparse rewards, compared against other deep RL exploration approaches.
arXiv Detail & Related papers (2021-01-06T14:15:07Z) - Efficient Exploration of Reward Functions in Inverse Reinforcement
Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration.
This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Neural Topological SLAM for Visual Navigation [112.73876869904]
We design topological representations for space that leverage semantics and afford approximate geometric reasoning.
We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation.
arXiv Detail & Related papers (2020-05-25T17:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.