Robust Policies via Mid-Level Visual Representations: An Experimental
Study in Manipulation and Navigation
- URL: http://arxiv.org/abs/2011.06698v1
- Date: Fri, 13 Nov 2020 00:16:05 GMT
- Title: Robust Policies via Mid-Level Visual Representations: An Experimental
Study in Manipulation and Navigation
- Authors: Bryan Chen, Alexander Sax, Gene Lewis, Iro Armeni, Silvio Savarese,
Amir Zamir, Jitendra Malik, Lerrel Pinto
- Abstract summary: We study the effects of using mid-level visual representations as generic and easy-to-decode perceptual state in an end-to-end RL framework.
We show that they aid generalization, improve sample complexity, and lead to a higher final performance.
In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed.
- Score: 115.4071729927011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-based robotics often separates the control loop into one module for
perception and a separate module for control. It is possible to train the whole
system end-to-end (e.g. with deep RL), but doing it "from scratch" comes with a
high sample complexity cost and the final result is often brittle, failing
unexpectedly if the test environment differs from that of training.
We study the effects of using mid-level visual representations (features
learned asynchronously for traditional computer vision objectives), as a
generic and easy-to-decode perceptual state in an end-to-end RL framework.
Mid-level representations encode invariances about the world, and we show that
they aid generalization, improve sample complexity, and lead to a higher final
performance. Compared to other approaches for incorporating invariances, such
as domain randomization, asynchronously trained mid-level representations scale
better: both to harder problems and to larger domain shifts. In practice, this
means that mid-level representations could be used to successfully train
policies for tasks where domain randomization and learning-from-scratch failed.
We report results on both manipulation and navigation tasks, and for navigation
include zero-shot sim-to-real experiments on real robots.
Related papers
- DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors [13.700885996266457]
Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents.
We present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents.
Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods.
arXiv Detail & Related papers (2024-09-26T23:07:01Z) - Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Invariance is Key to Generalization: Examining the Role of
Representation in Sim-to-Real Transfer for Visual Navigation [35.01394611106655]
Key to generalization is representations that are rich enough to capture all task-relevant information.
We experimentally study such a representation for visual navigation.
We show that our representation reduces the A-distance between the training and test domains.
arXiv Detail & Related papers (2023-10-23T15:15:19Z) - Semantic Tracklets: An Object-Centric Representation for Visual
Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation.
We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment.
Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z) - From Simulation to Real World Maneuver Execution using Deep
Reinforcement Learning [69.23334811890919]
Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios.
This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets.
We present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios.
arXiv Detail & Related papers (2020-05-13T14:22:20Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.