Learning to Play by Imitating Humans
- URL: http://arxiv.org/abs/2006.06874v1
- Date: Thu, 11 Jun 2020 23:28:54 GMT
- Title: Learning to Play by Imitating Humans
- Authors: Rostam Dinyari and Pierre Sermanet and Corey Lynch
- Abstract summary: We show that it is possible to acquire a diverse set of skills by self-supervising control on top of human teleoperated play data.
By training a behavioral cloning policy on a relatively small quantity of human play, we autonomously generate a large quantity of cloned play data.
We demonstrate that a general purpose goal-conditioned policy trained on this augmented dataset substantially outperforms one trained only with the original human data.
- Score: 8.209859328381269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acquiring multiple skills has commonly involved collecting a large number of
expert demonstrations per task or engineering custom reward functions. Recently
it has been shown that it is possible to acquire a diverse set of skills by
self-supervising control on top of human teleoperated play data. Play is rich
in state space coverage and a policy trained on this data can generalize to
specific tasks at test time outperforming policies trained on individual expert
task demonstrations. In this work, we explore the question of whether robots
can learn to play to autonomously generate play data that can ultimately
enhance performance. By training a behavioral cloning policy on a relatively
small quantity of human play, we autonomously generate a large quantity of
cloned play data that can be used as additional training. We demonstrate that a
general purpose goal-conditioned policy trained on this augmented dataset
substantially outperforms one trained only with the original human data on 18
difficult user-specified manipulation tasks in a simulated robotic tabletop
environment. A video example of a robot imitating human play can be seen here:
https://learning-to-play.github.io/videos/undirected_play1.mp4
Related papers
- Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training [69.54948297520612]
Learning a generalist embodied agent poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets.
We introduce a novel framework to tackle these challenges, which leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos.
Our method generates high-fidelity future videos for planning and enhances the fine-tuned policies compared to previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-02-22T09:48:47Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.