Related papers: Unsupervised Visual Attention and Invariance for Reinforcement Learning

Unsupervised Visual Attention and Invariance for Reinforcement Learning

URL: http://arxiv.org/abs/2104.02921v1
Date: Wed, 7 Apr 2021 05:28:01 GMT
Title: Unsupervised Visual Attention and Invariance for Reinforcement Learning
Authors: Xudong Wang, Long Lian, Stella X. Yu
Abstract summary: We develop an independent module to disperse interference factors irrelevant to the task, thereby providing "clean" observations for the vision-based reinforcement learning policy. All components are optimized in an unsupervised way, without manual annotation or access to environment internals. VAI empirically shows powerful generalization capabilities and significantly outperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMind Control suite benchmark.
Score: 25.673868326662024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The vision-based reinforcement learning (RL) has achieved tremendous success. However, generalizing vision-based RL policy to unknown test environments still remains as a challenging problem. Unlike previous works that focus on training a universal RL policy that is invariant to discrepancies between test and training environment, we focus on developing an independent module to disperse interference factors irrelevant to the task, thereby providing "clean" observations for the RL policy. The proposed unsupervised visual attention and invariance method (VAI) contains three key components: 1) an unsupervised keypoint detection model which captures semantically meaningful keypoints in observations; 2) an unsupervised visual attention module which automatically generates the distraction-invariant attention mask for each observation; 3) a self-supervised adapter for visual distraction invariance which reconstructs distraction-invariant attention mask from observations with artificial disturbances generated by a series of foreground and background augmentations. All components are optimized in an unsupervised way, without manual annotation or access to environment internals, and only the adapter is used during inference time to provide distraction-free observations to RL policy. VAI empirically shows powerful generalization capabilities and significantly outperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMind Control suite benchmark and 61% to 229% in our proposed robot manipulation benchmark, in term of cumulative rewards per episode.

Related papers

Anomalous Decision Discovery using Inverse Reinforcement Learning [3.3675535571071746]
Anomaly detection plays a critical role in Autonomous Vehicles (AVs) by identifying unusual behaviors through perception systems.<n>Current approaches, which often rely on predefined thresholds or supervised learning paradigms, exhibit reduced efficacy when confronted with unseen scenarios.<n>We present Trajectory-Reward Guided Adaptive Pre-training (TRAP), a novel IRL framework for anomaly detection.
arXiv Detail & Related papers (2025-07-06T17:01:02Z)
One RL to See Them All: Visual Triple Unified Reinforcement Learning [92.90120580989839]
We propose V-Triune, a Visual Triple Unified Reinforcement Learning system that enables visual reasoning and perception tasks within a single training pipeline.<n>V-Triune comprises triple complementary components: Sample-Level Datashelf (to unify diverse task inputs), Verifier-Level Reward (to deliver custom rewards via specialized verifiers).<n>We introduce a novel Dynamic IoU reward, which provides adaptive, progressive, and definite feedback for perception tasks handled by V-Triune.
arXiv Detail & Related papers (2025-05-23T17:41:14Z)
Backdoor Cleaning without External Guidance in MLLM Fine-tuning [76.82121084745785]
Believe Your Eyes (BYE) is a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples.<n>It achieves near-zero attack success rates while maintaining clean-task performance.
arXiv Detail & Related papers (2025-05-22T17:11:58Z)
SFO: Piloting VLM Feedback for Offline RL [1.3597551064547502]
Vision-Language Models (VLMs) are limited in their ability to solve control tasks due to their lack of action-conditioned training data. A key challenge in Reinforcement Learning from AI Feedback is determining how best to integrate VLM-derived signals into the learning process. We propose a simple yet effective approach--filtered and weighted behavior cloning--consistently outperforms more complex reinforcement learning from human feedback-based methods.
arXiv Detail & Related papers (2025-03-02T23:52:46Z)
ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z)
Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation [47.61391583947082]
Visuomotor policies learn control mechanisms directly from high-dimensional visual observations. Data augmentation emerges as a promising method for bridging generalization gaps by enriching data variety. We propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects.
arXiv Detail & Related papers (2024-01-17T15:05:00Z)
Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations [1.8130068086063336]
We contribute to the field of unsupervised (visual) representation learning from three perspectives. We design unsupervised, backpropagation-free Convolutional Self-Organizing Neural Networks (CSNNs) We build upon the widely used (non-)linear evaluation protocol to define pretext- and target-objective-independent metrics. We contribute CARLANE, the first 3-way sim-to-real domain adaptation benchmark for 2D lane detection, and a method based on self-supervised learning.
arXiv Detail & Related papers (2023-11-30T15:57:55Z)
VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control. We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Hybrid Dynamic Contrast and Probability Distillation for Unsupervised Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system. We present the hybrid dynamic cluster contrast and probability distillation algorithm. It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z)
Digging into Uncertainty in Self-supervised Multi-view Stereo [57.04768354383339]
We propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning. Our framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.
arXiv Detail & Related papers (2021-08-30T02:53:08Z)
Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z)
Robust Deep Reinforcement Learning via Multi-View Information Bottleneck [7.188571996124112]
We introduce an auxiliary objective based on the multi-view information bottleneck (MIB) principle. This encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions. We demonstrate that our approach can achieve SOTA performance on challenging visual control tasks, even when the background is replaced with natural videos.
arXiv Detail & Related papers (2021-02-26T02:24:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.