Human-in-the-Loop Imitation Learning using Remote Teleoperation
- URL: http://arxiv.org/abs/2012.06733v1
- Date: Sat, 12 Dec 2020 05:30:35 GMT
- Title: Human-in-the-Loop Imitation Learning using Remote Teleoperation
- Authors: Ajay Mandlekar, Danfei Xu, Roberto Mart\'in-Mart\'in, Yuke Zhu, Li
Fei-Fei, Silvio Savarese
- Abstract summary: We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
- Score: 72.2847988686463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation Learning is a promising paradigm for learning complex robot
manipulation skills by reproducing behavior from human demonstrations. However,
manipulation tasks often contain bottleneck regions that require a sequence of
precise actions to make meaningful progress, such as a robot inserting a pod
into a coffee machine to make coffee. Trained policies can fail in these
regions because small deviations in actions can lead the policy into states not
covered by the demonstrations. Intervention-based policy learning is an
alternative that can address this issue -- it allows human operators to monitor
trained policies and take over control when they encounter failures. In this
paper, we build a data collection system tailored to 6-DoF manipulation
settings, that enables remote human operators to monitor and intervene on
trained policies. We develop a simple and effective algorithm to train the
policy iteratively on new data collected by the system that encourages the
policy to learn how to traverse bottlenecks through the interventions. We
demonstrate that agents trained on data collected by our intervention-based
system and algorithm outperform agents trained on an equivalent number of
samples collected by non-interventional demonstrators, and further show that
our method outperforms multiple state-of-the-art baselines for learning from
the human interventions on a challenging robot threading task and a coffee
making task. Additional results and videos at
https://sites.google.com/stanford.edu/iwr .
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - MILES: Making Imitation Learning Easy with Self-Supervision [12.314942459360605]
MILES is a fully autonomous, self-supervised data collection paradigm.
We show that MILES enables efficient policy learning from just a single demonstration and a single environment reset.
arXiv Detail & Related papers (2024-10-25T17:06:50Z) - IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning [43.19346528232497]
A popular approach for increasing policy robustness to distribution shift is interactive imitation learning.
We propose IntervenGen, a novel data generation system that can autonomously produce a large set of corrective interventions.
We show that it can increase policy robustness by up to 39x with only 10 human interventions.
arXiv Detail & Related papers (2024-05-02T17:06:19Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Self-Supervised Learning of Multi-Object Keypoints for Robotic
Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning.
We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z) - Learning to Guide Multiple Heterogeneous Actors from a Single Human
Demonstration via Automatic Curriculum Learning in StarCraft II [0.5911087507716211]
In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors.
Our results show that an agent trained via automated curriculum learning can outperform state-of-the-art deep reinforcement learning baselines.
arXiv Detail & Related papers (2022-05-11T21:53:11Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning
Systems [0.8223798883838329]
This research investigates how to integrate human interaction modalities to the reinforcement learning loop.
Results show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms.
arXiv Detail & Related papers (2020-08-30T17:28:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.