Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models
- URL: http://arxiv.org/abs/2011.01298v1
- Date: Mon, 2 Nov 2020 20:32:05 GMT
- Title: Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models
- Authors: Yuchen Wu, Melissa Mozifian, Florian Shkurti
- Abstract summary: We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
- Score: 18.195406135434503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The potential benefits of model-free reinforcement learning to real robotics
systems are limited by its uninformed exploration that leads to slow
convergence, lack of data-efficiency, and unnecessary interactions with the
environment. To address these drawbacks we propose a method that combines
reinforcement and imitation learning by shaping the reward function with a
state-and-action-dependent potential that is trained from demonstration data,
using a generative model. We show that this accelerates policy learning by
specifying high-value areas of the state and action space that are worth
exploring first. Unlike the majority of existing methods that assume optimal
demonstrations and incorporate the demonstration data as hard constraints on
policy optimization, we instead incorporate demonstration data as advice in the
form of a reward shaping potential trained as a generative model of states and
actions. In particular, we examine both normalizing flows and Generative
Adversarial Networks to represent these potentials. We show that, unlike many
existing approaches that incorporate demonstrations as hard constraints, our
approach is unbiased even in the case of suboptimal and noisy demonstrations.
We present an extensive range of simulations, as well as experiments on the
Franka Emika 7DOF arm, to demonstrate the practicality of our method.
Related papers
- Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent [75.91274222142079]
In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents.
AdaDemo is a framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset.
arXiv Detail & Related papers (2024-04-11T01:59:29Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Robust Imitation of a Few Demonstrations with a Backwards Model [3.8530020696501794]
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning.
We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course.
With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
arXiv Detail & Related papers (2022-10-17T18:02:19Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.