What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?
- URL: http://arxiv.org/abs/2310.02219v2
- Date: Sat, 13 Jul 2024 18:18:09 GMT
- Title: What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?
- Authors: Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets,
- Abstract summary: We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks.
We can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav.
- Score: 48.75469525877328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study involves five different PVRs, each trained for five distinct manipulation or indoor navigation tasks. We performed this evaluation using three different robots and two different policy learning paradigms. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance. See project website for additional details and visuals.
Related papers
- SPA: 3D Spatial-Awareness Enables Effective Embodied Representation [20.123243422061048]
We introduce SPA, a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI.
We present the most comprehensive evaluation of embodied representation learning to date, covering 268 tasks across 8 simulators.
arXiv Detail & Related papers (2024-10-10T17:59:51Z) - Value Explicit Pretraining for Learning Transferable Representations [11.069853883599102]
We propose a method that learns generalizable representations for transfer reinforcement learning.
We learn new tasks that share similar objectives as previously learned tasks, by learning an encoder for objective-conditioned representations.
Experiments using a realistic navigation simulator and Atari benchmark show that the pretrained encoder produced by our method outperforms current SoTA pretraining methods.
arXiv Detail & Related papers (2023-12-19T17:12:35Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Where are we in the search for an Artificial Visual Cortex for Embodied
Intelligence? [106.81451807227103]
We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI.
To study the effect of pre-training data size and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources.
Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either.
arXiv Detail & Related papers (2023-03-31T17:56:33Z) - Offline Visual Representation Learning for Embodied Navigation [50.442660137987275]
offline pretraining of visual representations with self-supervised learning (SSL)
Online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.
arXiv Detail & Related papers (2022-04-27T23:22:43Z) - The Unsurprising Effectiveness of Pre-Trained Vision Models for Control [33.30717429522186]
We study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets.
We find that pre-trained visual representations can be competitive or even better than ground-truth state representations to train control policies.
arXiv Detail & Related papers (2022-03-07T18:26:14Z) - On Embodied Visual Navigation in Real Environments Through Habitat [20.630139085937586]
Visual navigation models based on deep learning can learn effective policies when trained on large amounts of visual observations.
To deal with this limitation, several simulation platforms have been proposed in order to train visual navigation policies on virtual environments efficiently.
We show that our tool can effectively help to train and evaluate navigation policies on real-world observations without running navigation pisodes in the real world.
arXiv Detail & Related papers (2020-10-26T09:19:07Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - SimAug: Learning Robust Representations from Simulation for Trajectory
Prediction [78.91518036949918]
We propose a novel approach to learn robust representation through augmenting the simulation training data.
We show that SimAug achieves promising results on three real-world benchmarks using zero real training data.
arXiv Detail & Related papers (2020-04-04T21:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.