Robust Imitation of a Few Demonstrations with a Backwards Model
- URL: http://arxiv.org/abs/2210.09337v1
- Date: Mon, 17 Oct 2022 18:02:19 GMT
- Title: Robust Imitation of a Few Demonstrations with a Backwards Model
- Authors: Jung Yeon Park, Lawson L.S. Wong
- Abstract summary: Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning.
We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course.
With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
- Score: 3.8530020696501794
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Behavior cloning of expert demonstrations can speed up learning optimal
policies in a more sample-efficient way over reinforcement learning. However,
the policy cannot extrapolate well to unseen states outside of the
demonstration data, creating covariate shift (agent drifting away from
demonstrations) and compounding errors. In this work, we tackle this issue by
extending the region of attraction around the demonstrations so that the agent
can learn how to get back onto the demonstrated trajectories if it veers
off-course. We train a generative backwards dynamics model and generate short
imagined trajectories from states in the demonstrations. By imitating both
demonstrations and these model rollouts, the agent learns the demonstrated
paths and how to get back onto these paths. With optimal or near-optimal
demonstrations, the learned policy will be both optimal and robust to
deviations, with a wider region of attraction. On continuous control domains,
we evaluate the robustness when starting from different initial states unseen
in the demonstration data. While both our method and other imitation learning
baselines can successfully solve the tasks for initial states in the training
distribution, our method exhibits considerably more robustness to different
initial states.
Related papers
- Zero-shot Imitation Policy via Search in Demonstration Dataset [0.16817021284806563]
Behavioral cloning uses a dataset of demonstrations to learn a policy.
We propose to use latent spaces of pre-trained foundation models to index a demonstration dataset.
Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2024-01-29T18:38:29Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Learning Feasibility to Imitate Demonstrators with Different Dynamics [23.239058855103067]
The goal of learning from demonstrations is to learn a policy for an agent (imitator) by mimicking the behavior in the demonstrations.
We learn a feasibility metric that captures the likelihood of a demonstration being feasible by the imitator.
Our experiments on four simulated environments and on a real robot show that the policy learned with our approach achieves a higher expected return than prior works.
arXiv Detail & Related papers (2021-10-28T14:15:47Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.