Learning to See before Learning to Act: Visual Pre-training for
Manipulation
- URL: http://arxiv.org/abs/2107.00646v1
- Date: Thu, 1 Jul 2021 17:58:37 GMT
- Title: Learning to See before Learning to Act: Visual Pre-training for
Manipulation
- Authors: Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin
- Abstract summary: We find that pre-training on vision tasks significantly improves generalization and sample efficiency for learning to manipulate objects.
We explore directly transferring model parameters from vision networks to affordance prediction networks, and show that this can result in successful zero-shot adaptation.
With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.
- Score: 48.731528716324355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Does having visual priors (e.g. the ability to detect objects) facilitate
learning to perform vision-based manipulation (e.g. picking up objects)? We
study this problem under the framework of transfer learning, where the model is
first trained on a passive vision task, and adapted to perform an active
manipulation task. We find that pre-training on vision tasks significantly
improves generalization and sample efficiency for learning to manipulate
objects. However, realizing these gains requires careful selection of which
parts of the model to transfer. Our key insight is that outputs of standard
vision models highly correlate with affordance maps commonly used in
manipulation. Therefore, we explore directly transferring model parameters from
vision networks to affordance prediction networks, and show that this can
result in successful zero-shot adaptation, where a robot can pick up certain
objects with zero robotic experience. With just a small amount of robotic
experience, we can further fine-tune the affordance model to achieve better
results. With just 10 minutes of suction experience or 1 hour of grasping
experience, our method achieves ~80% success rate at picking up novel objects.
Related papers
- Latent Action Pretraining from Videos [156.88613023078778]
We introduce Latent Action Pretraining for general Action models (LAPA)
LAPA is an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels.
We propose a method to learn from internet-scale videos that do not have robot action labels.
arXiv Detail & Related papers (2024-10-15T16:28:09Z) - Theia: Distilling Diverse Vision Foundation Models for Robot Learning [6.709078873834651]
Theia is a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks.
Theia's rich visual representations encode diverse visual knowledge, enhancing downstream robot learning.
arXiv Detail & Related papers (2024-07-29T17:08:21Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
and Methods [14.780597545674157]
We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives.
We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
arXiv Detail & Related papers (2023-08-07T14:24:52Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for
End-to-End Visual Robotic Manipulation Learning [2.8388425545775386]
We present end-to-end SE(3)-equivariant models for visual robotic manipulation from a point cloud input.
We show that our models can learn from scratch without prior knowledge yet is highly sample efficient.
arXiv Detail & Related papers (2022-06-16T17:26:06Z) - What Can I Do Here? Learning New Skills by Imagining Visual Affordances [128.65223577406587]
We show how generative models of possible outcomes can allow a robot to learn visual representations of affordances.
In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model.
We show that visuomotor affordance learning (VAL) can be used to train goal-conditioned policies that operate on raw image inputs.
arXiv Detail & Related papers (2021-06-01T17:58:02Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.