Related papers: Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning

Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning

URL: http://arxiv.org/abs/2502.08336v1
Date: Wed, 12 Feb 2025 12:00:16 GMT
Title: Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning
Authors: Sun Jingbo, Tu Songjun, Zhang Qichao, Chen Ke, Zhao Dongbin,
Abstract summary: Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning.<n>In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information.<n>We propose the Salience-Invariant Consistent Policy Learning algorithm, an efficient framework for zero-shot generalization.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning, where agents often overfit to the specific visual observations of the training environment. In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information. As a result, agents may deviate from the optimal behaviors learned during training, thereby hindering visual generalization.To address this issue, we propose the Salience-Invariant Consistent Policy Learning (SCPL) algorithm, an efficient framework for zero-shot generalization. Our approach introduces a novel value consistency module alongside a dynamics module to effectively capture task-relevant representations. The value consistency module, guided by saliency, ensures the agent focuses on task-relevant pixels in both original and perturbed observations, while the dynamics module uses augmented data to help the encoder capture dynamic- and reward-relevant representations. Additionally, our theoretical analysis highlights the importance of policy consistency for generalization. To strengthen this, we introduce a policy consistency module with a KL divergence constraint to maintain consistent policies across original and perturbed observations.Extensive experiments on the DMC-GB, Robotic Manipulation, and CARLA benchmarks demonstrate that SCPL significantly outperforms state-of-the-art methods in terms of generalization. Notably, SCPL achieves average performance improvements of 14\%, 39\%, and 69\% in the challenging DMC video hard setting, the Robotic hard setting, and the CARLA benchmark, respectively.Project Page: https://sites.google.com/view/scpl-rl.

Related papers

Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation [51.645152962504056]
In semi-supervised semantic segmentation, data augmentation plays a crucial role in the weak-to-strong consistency regularization framework.<n>We show that spatial augmentation can contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations.<n>We propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy.
arXiv Detail & Related papers (2025-05-29T13:35:48Z)
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents [6.402396836189286]
We present a novel contrastive prompt ensemble (ConPE) framework for embodied reinforcement learning. We devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations. In experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks.
arXiv Detail & Related papers (2024-12-16T06:53:00Z)
Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications [0.21051221444478305]
How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications. We propose an intrinsic dynamics-driven representation learning method with sequence models in visual reinforcement learning.
arXiv Detail & Related papers (2024-05-30T06:31:03Z)
Learning Latent Dynamic Robust Representations for World Models [9.806852421730165]
Visual Model-Based Reinforcement Learning (MBL) promises to agent's knowledge about the underlying dynamics of the environment. Top-temporal agents such as Dreamer often struggle with visual pixel-based inputs in the presence of irrelevant noise in the observation space. We apply a-temporal masking strategy, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models.
arXiv Detail & Related papers (2024-05-10T06:28:42Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Sequential Action-Induced Invariant Representation for Reinforcement Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning. We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Generalization in Visual Reinforcement Learning with the Reward Sequence Distribution [98.67737684075587]
Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL) We propose the reward sequence distribution conditioned on the starting observation and the predefined subsequent action sequence (RSD-OA) Experiments demonstrate that our representation learning approach based on RSD-OA significantly improves the generalization performance on unseen environments.
arXiv Detail & Related papers (2023-02-19T15:47:24Z)
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality. Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z)
Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning. We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information. Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z)
Weakly Supervised Disentangled Representation for Goal-conditioned Reinforcement Learning [15.698612710580447]
We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization. In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation. We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
arXiv Detail & Related papers (2022-02-28T09:05:14Z)
Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs. We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.