IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning
- URL: http://arxiv.org/abs/2405.01472v1
- Date: Thu, 2 May 2024 17:06:19 GMT
- Title: IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning
- Authors: Ryan Hoque, Ajay Mandlekar, Caelan Garrett, Ken Goldberg, Dieter Fox,
- Abstract summary: A popular approach for increasing policy robustness to distribution shift is interactive imitation learning.
We propose IntervenGen, a novel data generation system that can autonomously produce a large set of corrective interventions.
We show that it can increase policy robustness by up to 39x with only 10 human interventions.
- Score: 43.19346528232497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective interventions during policy rollouts. However, collecting a sufficient amount of interventions to cover the distribution of policy mistakes can be burdensome for human operators. We propose IntervenGen (I-Gen), a novel data generation system that can autonomously produce a large set of corrective interventions with rich coverage of the state space from a small number of human interventions. We apply I-Gen to 4 simulated environments and 1 physical environment with object pose estimation error and show that it can increase policy robustness by up to 39x with only 10 human interventions. Videos and more results are available at https://sites.google.com/view/intervengen2024.
Related papers
- Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models [60.87795376541144]
A world model is a neural network capable of predicting an agent's next state given past states and actions.
During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations.
We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing.
arXiv Detail & Related papers (2024-09-25T06:48:25Z) - MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention [81.56607128684723]
We introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention.
MereQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions.
It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function.
arXiv Detail & Related papers (2024-06-24T01:51:09Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Asking for Help: Failure Prediction in Behavioral Cloning through Value
Approximation [8.993237527071756]
We introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy.
We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening.
arXiv Detail & Related papers (2023-02-08T20:56:23Z) - Learning Latent Traits for Simulated Cooperative Driving Tasks [10.009803620912777]
We build a framework capable of capturing a compact latent representation of the human in terms of their behavior and preferences.
We then build a lightweight simulation environment, HMIway-env, for modelling one form of distracted driving behavior.
We finally use this environment to quantify both the ability to discriminate drivers and the effectiveness of intervention policies.
arXiv Detail & Related papers (2022-07-20T02:27:18Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Error-Aware Policy Learning: Zero-Shot Generalization in Partially
Observable Dynamic Environments [18.8481771211768]
We introduce a novel approach to tackle such a sim-to-real problem by developing policies capable of adapting to new environments.
Key to our approach is an error-aware policy (EAP) that is explicitly made aware of the effect of unobservable factors during training.
We show that a trained EAP for a hip-torque assistive device can be transferred to different human agents with unseen biomechanical characteristics.
arXiv Detail & Related papers (2021-03-13T15:36:44Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - Offline Learning for Planning: A Summary [0.0]
Training of autonomous agents often requires expensive and unsafe trial-and-error interactions with the environment.
Data sets containing recorded experiences of intelligent agents performing various tasks are accessible on the internet.
In this paper we adumbrate the ideas motivating the development of the state-of-the-art offline learning baselines.
arXiv Detail & Related papers (2020-10-05T11:41:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.