Generalization in Visual Reinforcement Learning with the Reward Sequence
Distribution
- URL: http://arxiv.org/abs/2302.09601v1
- Date: Sun, 19 Feb 2023 15:47:24 GMT
- Title: Generalization in Visual Reinforcement Learning with the Reward Sequence
Distribution
- Authors: Jie Wang, Rui Yang, Zijie Geng, Zhihao Shi, Mingxuan Ye, Qi Zhou,
Shuiwang Ji, Bin Li, Yongdong Zhang, and Feng Wu
- Abstract summary: Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL)
We propose the reward sequence distribution conditioned on the starting observation and the predefined subsequent action sequence (RSD-OA)
Experiments demonstrate that our representation learning approach based on RSD-OA significantly improves the generalization performance on unseen environments.
- Score: 98.67737684075587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization in partially observed markov decision processes (POMDPs) is
critical for successful applications of visual reinforcement learning (VRL) in
real scenarios. A widely used idea is to learn task-relevant representations
that encode task-relevant information of common features in POMDPs, i.e.,
rewards and transition dynamics. As transition dynamics in the latent state
space -- which are task-relevant and invariant to visual distractions -- are
unknown to the agents, existing methods alternatively use transition dynamics
in the observation space to extract task-relevant information in transition
dynamics. However, such transition dynamics in the observation space involve
task-irrelevant visual distractions, degrading the generalization performance
of VRL methods. To tackle this problem, we propose the reward sequence
distribution conditioned on the starting observation and the predefined
subsequent action sequence (RSD-OA). The appealing features of RSD-OA include
that: (1) RSD-OA is invariant to visual distractions, as it is conditioned on
the predefined subsequent action sequence without task-irrelevant information
from transition dynamics, and (2) the reward sequence captures long-term
task-relevant information in both rewards and transition dynamics. Experiments
demonstrate that our representation learning approach based on RSD-OA
significantly improves the generalization performance on unseen environments,
outperforming several state-of-the-arts on DeepMind Control tasks with visual
distractions.
Related papers
- Unsupervised Representation Learning of Complex Time Series for Maneuverability State Identification in Smart Mobility [0.0]
In smart mobility, MTS plays a crucial role in providing temporal dynamics of behaviors such as maneuver patterns.
In this work, we aim to address challenges associated with modeling MTS data collected from a vehicle using sensors.
Our goal is to investigate the effectiveness of two distinct unsupervised representation learning approaches in identifying maneuvering states in smart mobility.
arXiv Detail & Related papers (2024-08-26T15:16:18Z) - Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications [0.21051221444478305]
How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications.
We propose an intrinsic dynamics-driven representation learning method with sequence models in visual reinforcement learning.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - STAT: Towards Generalizable Temporal Action Localization [56.634561073746056]
Weakly-supervised temporal action localization (WTAL) aims to recognize and localize action instances with only video-level labels.
Existing methods suffer from severe performance degradation when transferring to different distributions.
We propose GTAL, which focuses on improving the generalizability of action localization methods.
arXiv Detail & Related papers (2024-04-20T07:56:21Z) - Sequential Action-Induced Invariant Representation for Reinforcement
Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - Learning Task-relevant Representations for Generalization via
Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning.
We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information.
Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Learning Self-Modulating Attention in Continuous Time Space with
Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences.
We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z) - Transfer RL across Observation Feature Spaces via Model-Based
Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations.
We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task.
Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.