Versatile Inverse Reinforcement Learning via Cumulative Rewards
- URL: http://arxiv.org/abs/2111.07667v1
- Date: Mon, 15 Nov 2021 10:49:15 GMT
- Title: Versatile Inverse Reinforcement Learning via Cumulative Rewards
- Authors: Niklas Freymuth and Philipp Becker and Gerhard Neumann
- Abstract summary: Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert.
We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators.
- Score: 22.56145954060092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse Reinforcement Learning infers a reward function from expert
demonstrations, aiming to encode the behavior and intentions of the expert.
Current approaches usually do this with generative and uni-modal models,
meaning that they encode a single behavior. In the common setting, where there
are various solutions to a problem and the experts show versatile behavior this
severely limits the generalization capabilities of these methods. We propose a
novel method for Inverse Reinforcement Learning that overcomes these problems
by formulating the recovered reward as a sum of iteratively trained
discriminators. We show on simulated tasks that our approach is able to recover
general, high-quality reward functions and produces policies of the same
quality as behavioral cloning approaches designed for versatile behavior.
Related papers
- Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Learning Long-Term Reward Redistribution via Randomized Return
Decomposition [18.47810850195995]
We consider the problem formulation of episodic reinforcement learning with trajectory feedback.
It refers to an extreme delay of reward signals, in which the agent can only obtain one reward signal at the end of each trajectory.
We propose a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning.
arXiv Detail & Related papers (2021-11-26T13:23:36Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards.
To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function.
We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Regularized Inverse Reinforcement Learning [49.78352058771138]
Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior.
Regularized IRL applies strongly convex regularizers to the learner's policy.
We propose tractable solutions, and practical methods to obtain them, for regularized IRL.
arXiv Detail & Related papers (2020-10-07T23:38:47Z) - Reinforcement Learning with Trajectory Feedback [76.94405309609552]
In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as emphtrajectory feedback.
Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory.
We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret.
arXiv Detail & Related papers (2020-08-13T17:49:18Z) - oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.