Get Back Here: Robust Imitation by Return-to-Distribution Planning
- URL: http://arxiv.org/abs/2305.01400v1
- Date: Tue, 2 May 2023 13:19:08 GMT
- Title: Get Back Here: Robust Imitation by Return-to-Distribution Planning
- Authors: Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin,
Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin,
Robert Dadashi
- Abstract summary: We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version.
To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution.
The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time.
- Score: 43.26690674765619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the Imitation Learning (IL) setup where expert data are not
collected on the actual deployment environment but on a different version. To
address the resulting distribution shift, we combine behavior cloning (BC) with
a planner that is tasked to bring the agent back to states visited by the
expert whenever the agent deviates from the demonstration distribution. The
resulting algorithm, POIR, can be trained offline, and leverages online
interactions to efficiently fine-tune its planner to improve performance over
time. We test POIR on a variety of human-generated manipulation demonstrations
in a realistic robotic manipulation simulator and show robustness of the
learned policy to different initial state distributions and noisy dynamics.
Related papers
- Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.
By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.
On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Zero-shot Imitation Policy via Search in Demonstration Dataset [0.16817021284806563]
Behavioral cloning uses a dataset of demonstrations to learn a policy.
We propose to use latent spaces of pre-trained foundation models to index a demonstration dataset.
Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2024-01-29T18:38:29Z) - Behavioral Cloning via Search in Video PreTraining Latent Space [0.13999481573773073]
We formulate our control problem as a search problem over a dataset of experts' demonstrations.
We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model.
The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge.
arXiv Detail & Related papers (2022-12-27T00:20:37Z) - Back to the Manifold: Recovering from Out-of-Distribution States [20.36024602311382]
We propose a recovery policy that brings the agent back to the training manifold whenever it steps out of the in-distribution states.
We demonstrate the effectiveness of the proposed method through several manipulation experiments on a real robotic platform.
arXiv Detail & Related papers (2022-07-18T15:10:58Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - SAFARI: Safe and Active Robot Imitation Learning with Imagination [16.967930721746676]
SAFARI is a novel active learning and control algorithm.
It allows an agent to request further human demonstrations when these out-of-distribution situations are met.
We show how this method enables the agent to autonomously predict failure rapidly and safely.
arXiv Detail & Related papers (2020-11-18T23:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.