Generalization in Visual Reinforcement Learning with the Reward Sequence
Distribution
- URL: http://arxiv.org/abs/2302.09601v1
- Date: Sun, 19 Feb 2023 15:47:24 GMT
- Title: Generalization in Visual Reinforcement Learning with the Reward Sequence
Distribution
- Authors: Jie Wang, Rui Yang, Zijie Geng, Zhihao Shi, Mingxuan Ye, Qi Zhou,
Shuiwang Ji, Bin Li, Yongdong Zhang, and Feng Wu
- Abstract summary: Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL)
We propose the reward sequence distribution conditioned on the starting observation and the predefined subsequent action sequence (RSD-OA)
Experiments demonstrate that our representation learning approach based on RSD-OA significantly improves the generalization performance on unseen environments.
- Score: 98.67737684075587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization in partially observed markov decision processes (POMDPs) is
critical for successful applications of visual reinforcement learning (VRL) in
real scenarios. A widely used idea is to learn task-relevant representations
that encode task-relevant information of common features in POMDPs, i.e.,
rewards and transition dynamics. As transition dynamics in the latent state
space -- which are task-relevant and invariant to visual distractions -- are
unknown to the agents, existing methods alternatively use transition dynamics
in the observation space to extract task-relevant information in transition
dynamics. However, such transition dynamics in the observation space involve
task-irrelevant visual distractions, degrading the generalization performance
of VRL methods. To tackle this problem, we propose the reward sequence
distribution conditioned on the starting observation and the predefined
subsequent action sequence (RSD-OA). The appealing features of RSD-OA include
that: (1) RSD-OA is invariant to visual distractions, as it is conditioned on
the predefined subsequent action sequence without task-irrelevant information
from transition dynamics, and (2) the reward sequence captures long-term
task-relevant information in both rewards and transition dynamics. Experiments
demonstrate that our representation learning approach based on RSD-OA
significantly improves the generalization performance on unseen environments,
outperforming several state-of-the-arts on DeepMind Control tasks with visual
distractions.
Related papers
- Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent [72.1517476116743]
Recent MLLMs have shown emerging visual understanding and reasoning abilities after being pre-trained on large-scale multimodal datasets.
Existing approaches, such as direct fine-tuning and continual learning methods, fail to explicitly address this issue.
We introduce a novel perspective leveraging effective rank to quantify the degradation of visual representation forgetting.
We propose a modality-decoupled gradient descent (MDGD) method that regulates gradient updates to maintain the effective rank of visual representations.
arXiv Detail & Related papers (2025-02-17T12:26:34Z) - Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning [0.0]
Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning.
In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information.
We propose the Salience-Invariant Consistent Policy Learning algorithm, an efficient framework for zero-shot generalization.
arXiv Detail & Related papers (2025-02-12T12:00:16Z) - Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications [0.21051221444478305]
How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications.
We propose an intrinsic dynamics-driven representation learning method with sequence models in visual reinforcement learning.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - STAT: Towards Generalizable Temporal Action Localization [56.634561073746056]
Weakly-supervised temporal action localization (WTAL) aims to recognize and localize action instances with only video-level labels.
Existing methods suffer from severe performance degradation when transferring to different distributions.
We propose GTAL, which focuses on improving the generalizability of action localization methods.
arXiv Detail & Related papers (2024-04-20T07:56:21Z) - Sequential Action-Induced Invariant Representation for Reinforcement
Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - Learning Task-relevant Representations for Generalization via
Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning.
We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information.
Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z) - Learning Self-Modulating Attention in Continuous Time Space with
Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences.
We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z) - Transfer RL across Observation Feature Spaces via Model-Based
Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations.
We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task.
Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.