The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
- URL: http://arxiv.org/abs/2203.03580v1
- Date: Mon, 7 Mar 2022 18:26:14 GMT
- Title: The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
- Authors: Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta
- Abstract summary: We study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets.
We find that pre-trained visual representations can be competitive or even better than ground-truth state representations to train control policies.
- Score: 33.30717429522186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have seen the emergence of pre-trained representations as a
powerful abstraction for AI applications in computer vision, natural language,
and speech. However, policy learning for control is still dominated by a
tabula-rasa learning paradigm, with visuo-motor policies often trained from
scratch using data from deployment environments. In this context, we revisit
and study the role of pre-trained visual representations for control, and in
particular representations trained on large-scale computer vision datasets.
Through extensive empirical evaluation in diverse control domains (Habitat,
DeepMind Control, Adroit, Franka Kitchen), we isolate and study the importance
of different representation training methods, data augmentations, and feature
hierarchies. Overall, we find that pre-trained visual representations can be
competitive or even better than ground-truth state representations to train
control policies. This is in spite of using only out-of-domain data from
standard vision datasets, without any in-domain data from the deployment
environments. Additional details and source code is available at
https://sites.google.com/view/pvr-control
Related papers
- DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors [13.700885996266457]
Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents.
We present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents.
Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods.
arXiv Detail & Related papers (2024-09-26T23:07:01Z) - Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained
Networks [52.766795949716986]
We present a study of the generalization capabilities of the pre-trained visual representations at the categorical level.
We propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy.
arXiv Detail & Related papers (2023-07-07T13:01:29Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Pretrained Encoders are All You Need [23.171881382391074]
Self-supervised models have shown successful transfer to diverse settings.
We also explore fine-tuning pretrained representations with self-supervised techniques.
Our results show that pretrained representations are at par with state-of-the-art self-supervised methods trained on domain-specific data.
arXiv Detail & Related papers (2021-06-09T15:27:25Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.