Unsupervised Visual Attention and Invariance for Reinforcement Learning
- URL: http://arxiv.org/abs/2104.02921v1
- Date: Wed, 7 Apr 2021 05:28:01 GMT
- Title: Unsupervised Visual Attention and Invariance for Reinforcement Learning
- Authors: Xudong Wang, Long Lian, Stella X. Yu
- Abstract summary: We develop an independent module to disperse interference factors irrelevant to the task, thereby providing "clean" observations for the vision-based reinforcement learning policy.
All components are optimized in an unsupervised way, without manual annotation or access to environment internals.
VAI empirically shows powerful generalization capabilities and significantly outperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMind Control suite benchmark.
- Score: 25.673868326662024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vision-based reinforcement learning (RL) has achieved tremendous success.
However, generalizing vision-based RL policy to unknown test environments still
remains as a challenging problem. Unlike previous works that focus on training
a universal RL policy that is invariant to discrepancies between test and
training environment, we focus on developing an independent module to disperse
interference factors irrelevant to the task, thereby providing "clean"
observations for the RL policy.
The proposed unsupervised visual attention and invariance method (VAI)
contains three key components: 1) an unsupervised keypoint detection model
which captures semantically meaningful keypoints in observations; 2) an
unsupervised visual attention module which automatically generates the
distraction-invariant attention mask for each observation; 3) a self-supervised
adapter for visual distraction invariance which reconstructs
distraction-invariant attention mask from observations with artificial
disturbances generated by a series of foreground and background augmentations.
All components are optimized in an unsupervised way, without manual annotation
or access to environment internals, and only the adapter is used during
inference time to provide distraction-free observations to RL policy.
VAI empirically shows powerful generalization capabilities and significantly
outperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMind
Control suite benchmark and 61% to 229% in our proposed robot manipulation
benchmark, in term of cumulative rewards per episode.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - An Efficient Generalizable Framework for Visuomotor Policies via
Control-aware Augmentation and Privilege-guided Distillation [47.61391583947082]
Visuomotor policies learn control mechanisms directly from high-dimensional visual observations.
Data augmentation emerges as a promising method for bridging generalization gaps by enriching data variety.
We propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects.
arXiv Detail & Related papers (2024-01-17T15:05:00Z) - Towards Unsupervised Representation Learning: Learning, Evaluating and
Transferring Visual Representations [1.8130068086063336]
We contribute to the field of unsupervised (visual) representation learning from three perspectives.
We design unsupervised, backpropagation-free Convolutional Self-Organizing Neural Networks (CSNNs)
We build upon the widely used (non-)linear evaluation protocol to define pretext- and target-objective-independent metrics.
We contribute CARLANE, the first 3-way sim-to-real domain adaptation benchmark for 2D lane detection, and a method based on self-supervised learning.
arXiv Detail & Related papers (2023-11-30T15:57:55Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Hybrid Dynamic Contrast and Probability Distillation for Unsupervised
Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system.
We present the hybrid dynamic cluster contrast and probability distillation algorithm.
It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z) - Digging into Uncertainty in Self-supervised Multi-view Stereo [57.04768354383339]
We propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning.
Our framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.
arXiv Detail & Related papers (2021-08-30T02:53:08Z) - Robust Deep Reinforcement Learning via Multi-View Information Bottleneck [7.188571996124112]
We introduce an auxiliary objective based on the multi-view information bottleneck (MIB) principle.
This encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions.
We demonstrate that our approach can achieve SOTA performance on challenging visual control tasks, even when the background is replaced with natural videos.
arXiv Detail & Related papers (2021-02-26T02:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.