The Surprising Effectiveness of Representation Learning for Visual
Imitation
- URL: http://arxiv.org/abs/2112.01511v2
- Date: Mon, 6 Dec 2021 16:46:37 GMT
- Title: The Surprising Effectiveness of Representation Learning for Visual
Imitation
- Authors: Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam,
Lerrel Pinto
- Abstract summary: We propose to decouple representation learning from behavior learning for visual imitation.
First, we learn a visual representation encoder from offline data using standard supervised and self-supervised learning methods.
We experimentally show that this simple decoupling improves the performance of visual imitation models on both offline demonstration datasets and real-robot door opening compared to prior work in visual imitation.
- Score: 12.60653315718265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While visual imitation learning offers one of the most effective ways of
learning from visual demonstrations, generalizing from them requires either
hundreds of diverse demonstrations, task specific priors, or large,
hard-to-train parametric models. One reason such complexities arise is because
standard visual imitation frameworks try to solve two coupled problems at once:
learning a succinct but good representation from the diverse visual data, while
simultaneously learning to associate the demonstrated actions with such
representations. Such joint learning causes an interdependence between these
two problems, which often results in needing large amounts of demonstrations
for learning. To address this challenge, we instead propose to decouple
representation learning from behavior learning for visual imitation. First, we
learn a visual representation encoder from offline data using standard
supervised and self-supervised learning methods. Once the representations are
trained, we use non-parametric Locally Weighted Regression to predict the
actions. We experimentally show that this simple decoupling improves the
performance of visual imitation models on both offline demonstration datasets
and real-robot door opening compared to prior work in visual imitation. All of
our generated data, code, and robot videos are publicly available at
https://jyopari.github.io/VINN/.
Related papers
- A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Playful Interactions for Representation Learning [82.59215739257104]
We propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks.
We collect 2 hours of playful data in 19 diverse environments and use self-predictive learning to extract visual representations.
Our representations generalize better than standard behavior cloning and can achieve similar performance with only half the number of required demonstrations.
arXiv Detail & Related papers (2021-07-19T17:54:48Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.