Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
  using Generative Models
        - URL: http://arxiv.org/abs/2011.01298v1
- Date: Mon, 2 Nov 2020 20:32:05 GMT
- Title: Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
  using Generative Models
- Authors: Yuchen Wu, Melissa Mozifian, Florian Shkurti
- Abstract summary: We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
- Score: 18.195406135434503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   The potential benefits of model-free reinforcement learning to real robotics
systems are limited by its uninformed exploration that leads to slow
convergence, lack of data-efficiency, and unnecessary interactions with the
environment. To address these drawbacks we propose a method that combines
reinforcement and imitation learning by shaping the reward function with a
state-and-action-dependent potential that is trained from demonstration data,
using a generative model. We show that this accelerates policy learning by
specifying high-value areas of the state and action space that are worth
exploring first. Unlike the majority of existing methods that assume optimal
demonstrations and incorporate the demonstration data as hard constraints on
policy optimization, we instead incorporate demonstration data as advice in the
form of a reward shaping potential trained as a generative model of states and
actions. In particular, we examine both normalizing flows and Generative
Adversarial Networks to represent these potentials. We show that, unlike many
existing approaches that incorporate demonstrations as hard constraints, our
approach is unbiased even in the case of suboptimal and noisy demonstrations.
We present an extensive range of simulations, as well as experiments on the
Franka Emika 7DOF arm, to demonstrate the practicality of our method.
 
      
        Related papers
        - SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from   Sparse and Noisy Demonstrations [68.9300049150948]
 We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID)<n>Existing data collection approaches yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions.<n>We present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood.
 arXiv  Detail & Related papers  (2025-05-04T13:00:29Z)
- From Demonstrations to Rewards: Alignment Without Explicit Human   Preferences [55.988923803469305]
 In this paper, we propose a fresh perspective on learning alignment based on inverse reinforcement learning principles.
Instead of relying on large preference data, we directly learn the reward model from demonstration data.
 arXiv  Detail & Related papers  (2025-03-15T20:53:46Z)
- Disentangled World Models: Learning to Transfer Semantic Knowledge from   Distracting Videos for Reinforcement Learning [93.58897637077001]
 This paper tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints.
We pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos.
For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model.
 arXiv  Detail & Related papers  (2025-03-11T13:50:22Z)
- Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations   for LLM Alignment [62.05713042908654]
 We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
 Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
 arXiv  Detail & Related papers  (2024-05-24T15:13:53Z)
- AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic   Agent [75.91274222142079]
 In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents.
AdaDemo is a framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset.
 arXiv  Detail & Related papers  (2024-04-11T01:59:29Z)
- Inverse Reinforcement Learning by Estimating Expertise of Demonstrators [15.662820454886205]
 IRLEED, Inverse Reinforcement Learning by Estimating Expertise of Demonstrators, is a novel framework that overcomes hurdles without prior knowledge of demonstrator expertise.
IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance.
 Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness.
 arXiv  Detail & Related papers  (2024-02-02T20:21:09Z)
- Inverse Dynamics Pretraining Learns Good Representations for Multitask
  Imitation [66.86987509942607]
 We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
 arXiv  Detail & Related papers  (2023-05-26T14:40:46Z)
- Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
 We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
 arXiv  Detail & Related papers  (2022-10-26T13:08:46Z)
- Robust Imitation of a Few Demonstrations with a Backwards Model [3.8530020696501794]
 Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning.
We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course.
With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
 arXiv  Detail & Related papers  (2022-10-17T18:02:19Z)
- Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
 We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
 arXiv  Detail & Related papers  (2021-10-27T04:27:28Z)
- DEALIO: Data-Efficient Adversarial Learning for Imitation from
  Observation [57.358212277226315]
 In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
 arXiv  Detail & Related papers  (2021-03-31T23:46:32Z)
- Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
 We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
 Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
 arXiv  Detail & Related papers  (2020-06-14T06:03:06Z)
- State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
 In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
 arXiv  Detail & Related papers  (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.