Scaling Robot Learning with Semantically Imagined Experience
- URL: http://arxiv.org/abs/2302.11550v1
- Date: Wed, 22 Feb 2023 18:47:51 GMT
- Title: Scaling Robot Learning with Semantically Imagined Experience
- Authors: Tianhe Yu, Ted Xiao, Austin Stone, Jonathan Tompson, Anthony Brohan,
Su Wang, Jaspiar Singh, Clayton Tan, Dee M, Jodilyn Peralta, Brian Ichter,
Karol Hausman, Fei Xia
- Abstract summary: Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
- Score: 21.361979238427722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in robot learning have shown promise in enabling robots to
perform a variety of manipulation tasks and generalize to novel scenarios. One
of the key contributing factors to this progress is the scale of robot data
used to train the models. To obtain large-scale datasets, prior approaches have
relied on either demonstrations requiring high human involvement or
engineering-heavy autonomous data collection schemes, both of which are
challenging to scale. To mitigate this issue, we propose an alternative route
and leverage text-to-image foundation models widely used in computer vision and
natural language processing to obtain meaningful data for robot learning
without requiring additional robot data. We term our method Robot Learning with
Semantically Imagened Experience (ROSIE). Specifically, we make use of the
state of the art text-to-image diffusion models and perform aggressive data
augmentation on top of our existing robotic manipulation datasets via
inpainting various unseen objects for manipulation, backgrounds, and
distractors with text guidance. Through extensive real-world experiments, we
show that manipulation policies trained on data augmented this way are able to
solve completely unseen tasks with new objects and can behave more robustly
w.r.t. novel distractors. In addition, we find that we can improve the
robustness and generalization of high-level robot learning tasks such as
success detection through training with the diffusion-based data augmentation.
The project's website and videos can be found at diffusion-rosie.github.io
Related papers
- Semantically Controllable Augmentations for Generalizable Robot Learning [40.89398799604755]
Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training.
We propose a generative augmentation framework for semantically controllable augmentations and rapidly multiplying robot datasets.
arXiv Detail & Related papers (2024-09-02T05:25:34Z) - HRP: Human Affordances for Robotic Pre-Training [15.92416819748365]
We present a framework for pre-training representations on hand, object, and contact.
We experimentally demonstrate (using 3000+ robot trials) that this affordance pre-training scheme boosts performance by a minimum of 15% on 5 real-world tasks.
arXiv Detail & Related papers (2024-07-26T17:59:52Z) - Towards Generalizable Zero-Shot Manipulation via Translating Human
Interaction Plans [58.27029676638521]
We show how passive human videos can serve as a rich source of data for learning such generalist robots.
We learn a human plan predictor that, given a current image of a scene and a goal image, predicts the future hand and object configurations.
We show that our learned system can perform over 16 manipulation skills that generalize to 40 objects.
arXiv Detail & Related papers (2023-12-01T18:54:12Z) - Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
and Methods [14.780597545674157]
We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives.
We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
arXiv Detail & Related papers (2023-08-07T14:24:52Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.