Weighted Maximum Entropy Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2208.09611v1
- Date: Sat, 20 Aug 2022 06:02:07 GMT
- Title: Weighted Maximum Entropy Inverse Reinforcement Learning
- Authors: The Viet Bui, Tien Mai, Patrick Jaillet
- Abstract summary: We study inverse reinforcement learning (IRL) and imitation learning (IM)
We propose a new way to improve the learning process by adding the maximum weight function to the entropy framework.
Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes.
- Score: 22.269565708490468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study inverse reinforcement learning (IRL) and imitation learning (IM),
the problems of recovering a reward or policy function from expert's
demonstrated trajectories. We propose a new way to improve the learning process
by adding a weight function to the maximum entropy framework, with the
motivation of having the ability to learn and recover the stochasticity (or the
bounded rationality) of the expert policy. Our framework and algorithms allow
to learn both a reward (or policy) function and the structure of the entropy
terms added to the Markov Decision Processes, thus enhancing the learning
procedure. Our numerical experiments using human and simulated demonstrations
and with discrete and continuous IRL/IM tasks show that our approach
outperforms prior algorithms.
Related papers
- A Model-Based Approach for Improving Reinforcement Learning Efficiency
Leveraging Expert Observations [9.240917262195046]
We propose an algorithm that automatically adjusts the weights of each component in the augmented loss function.
Experiments on a variety of continuous control tasks demonstrate that the proposed algorithm outperforms various benchmarks.
arXiv Detail & Related papers (2024-02-29T03:53:02Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - On Reward Structures of Markov Decision Processes [4.13365552362244]
A Markov decision process can be parameterized by a transition kernel and a reward function.
We study various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications.
We develop a novel estimator with an instance-specific error bound to $tildeO(sqrtfractau_sn)$ for estimating a single state value.
arXiv Detail & Related papers (2023-08-28T22:29:16Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Weighted Entropy Modification for Soft Actor-Critic [95.37322316673617]
We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights.
We propose an algorithm motivated for self-balancing exploration with the introduced weight function, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation.
arXiv Detail & Related papers (2020-11-18T04:36:03Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Inverse Reinforcement Learning from a Gradient-based Learner [41.8663538249537]
Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations.
In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent.
arXiv Detail & Related papers (2020-07-15T16:41:00Z) - Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch [60.23815709215807]
We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner.
We propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch.
arXiv Detail & Related papers (2020-07-02T14:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.