Back to the Manifold: Recovering from Out-of-Distribution States
- URL: http://arxiv.org/abs/2207.08673v1
- Date: Mon, 18 Jul 2022 15:10:58 GMT
- Title: Back to the Manifold: Recovering from Out-of-Distribution States
- Authors: Alfredo Reichlin, Giovanni Luca Marchetti, Hang Yin, Ali Ghadirzadeh
and Danica Kragic
- Abstract summary: We propose a recovery policy that brings the agent back to the training manifold whenever it steps out of the in-distribution states.
We demonstrate the effectiveness of the proposed method through several manipulation experiments on a real robotic platform.
- Score: 20.36024602311382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from previously collected datasets of expert data offers the promise
of acquiring robotic policies without unsafe and costly online explorations.
However, a major challenge is a distributional shift between the states in the
training dataset and the ones visited by the learned policy at the test time.
While prior works mainly studied the distribution shift caused by the policy
during the offline training, the problem of recovering from out-of-distribution
states at the deployment time is not very well studied yet. We alleviate the
distributional shift at the deployment time by introducing a recovery policy
that brings the agent back to the training manifold whenever it steps out of
the in-distribution states, e.g., due to an external perturbation. The recovery
policy relies on an approximation of the training data density and a learned
equivariant mapping that maps visual observations into a latent space in which
translations correspond to the robot actions. We demonstrate the effectiveness
of the proposed method through several manipulation experiments on a real
robotic platform. Our results show that the recovery policy enables the agent
to complete tasks while the behavioral cloning alone fails because of the
distributional shift problem.
Related papers
- Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy For Visuomotor Imitation Learning [2.6696199945489534]
We propose an object-centric recovery policy framework to address the challenges of out-of-distribution scenarios in visuomotor policy learning.
We demonstrate the effectiveness of our framework in both simulation and real robot experiments.
arXiv Detail & Related papers (2024-11-05T17:41:14Z) - Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models [60.87795376541144]
A world model is a neural network capable of predicting an agent's next state given past states and actions.
During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations.
We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing.
arXiv Detail & Related papers (2024-09-25T06:48:25Z) - Diffusion Policies for Out-of-Distribution Generalization in Offline
Reinforcement Learning [1.9336815376402723]
offline RL methods leverage previous experiences to learn better policies than the behavior policy used for data collection.
However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training.
We introduce a novel method named State Reconstruction for Diffusion Policies (SRDP), incorporating state reconstruction feature learning in the recent class of diffusion policies.
arXiv Detail & Related papers (2023-07-10T17:34:23Z) - Get Back Here: Robust Imitation by Return-to-Distribution Planning [43.26690674765619]
We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version.
To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution.
The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time.
arXiv Detail & Related papers (2023-05-02T13:19:08Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.