SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual
Policies
- URL: http://arxiv.org/abs/2106.09678v1
- Date: Thu, 17 Jun 2021 17:28:18 GMT
- Title: SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual
Policies
- Authors: Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke
Zhu, Anima Anandkumar
- Abstract summary: Generalization has been a long-standing challenge for reinforcement learning (RL)
In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift.
We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
- Score: 87.78260740602674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization has been a long-standing challenge for reinforcement learning
(RL). Visual RL, in particular, can be easily distracted by irrelevant factors
in high-dimensional observation space. In this work, we consider robust policy
learning which targets zero-shot generalization to unseen visual environments
with large distributional shift. We propose SECANT, a novel self-expert cloning
technique that leverages image augmentation in two stages to decouple robust
representation learning from policy optimization. Specifically, an expert
policy is first trained by RL from scratch with weak augmentations. A student
network then learns to mimic the expert policy by supervised learning with
strong augmentations, making its representation more robust against visual
variations compared to the expert. Extensive experiments demonstrate that
SECANT significantly advances the state of the art in zero-shot generalization
across 4 challenging domains. Our average reward improvements over prior SOTAs
are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based
autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code
release and video are available at https://linxifan.github.io/secant-site/.
Related papers
- RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies.
We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations.
Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
arXiv Detail & Related papers (2024-12-13T04:57:55Z) - A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning [12.889687274108248]
A Q-learning algorithm is prone to overfitting and training instabilities when trained from visual observations.
We propose a generalized recipe, SADA, that works with wider varieties of augmentations.
We find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations.
arXiv Detail & Related papers (2024-05-27T17:58:23Z) - An Efficient Generalizable Framework for Visuomotor Policies via
Control-aware Augmentation and Privilege-guided Distillation [47.61391583947082]
Visuomotor policies learn control mechanisms directly from high-dimensional visual observations.
Data augmentation emerges as a promising method for bridging generalization gaps by enriching data variety.
We propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects.
arXiv Detail & Related papers (2024-01-17T15:05:00Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Bootstrap State Representation using Style Transfer for Better
Generalization in Deep Reinforcement Learning [16.999444076456268]
Thinker is a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way.
Thinker has wide applicability among many Deep Reinforcement Learning settings.
arXiv Detail & Related papers (2022-07-15T20:49:45Z) - Improving Transferability of Representations via Augmentation-Aware
Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples.
Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.
AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Unsupervised Visual Attention and Invariance for Reinforcement Learning [25.673868326662024]
We develop an independent module to disperse interference factors irrelevant to the task, thereby providing "clean" observations for the vision-based reinforcement learning policy.
All components are optimized in an unsupervised way, without manual annotation or access to environment internals.
VAI empirically shows powerful generalization capabilities and significantly outperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMind Control suite benchmark.
arXiv Detail & Related papers (2021-04-07T05:28:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.