Sequential Action-Induced Invariant Representation for Reinforcement
Learning
- URL: http://arxiv.org/abs/2309.12628v1
- Date: Fri, 22 Sep 2023 05:31:55 GMT
- Title: Sequential Action-Induced Invariant Representation for Reinforcement
Learning
- Authors: Dayang Liang, Qihang Chen and Yunlong Liu
- Abstract summary: How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
- Score: 1.2046159151610263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to accurately learn task-relevant state representations from
high-dimensional observations with visual distractions is a realistic and
challenging problem in visual reinforcement learning. Recently, unsupervised
representation learning methods based on bisimulation metrics, contrast,
prediction, and reconstruction have shown the ability for task-relevant
information extraction. However, due to the lack of appropriate mechanisms for
the extraction of task information in the prediction, contrast, and
reconstruction-related approaches and the limitations of bisimulation-related
methods in domains with sparse rewards, it is still difficult for these methods
to be effectively extended to environments with distractions. To alleviate
these problems, in the paper, the action sequences, which contain
task-intensive signals, are incorporated into representation learning.
Specifically, we propose a Sequential Action--induced invariant Representation
(SAR) method, in which the encoder is optimized by an auxiliary learner to only
preserve the components that follow the control signals of sequential actions,
so the agent can be induced to learn the robust representation against
distractions. We conduct extensive experiments on the DeepMind Control suite
tasks with distractions while achieving the best performance over strong
baselines. We also demonstrate the effectiveness of our method at disregarding
task-irrelevant information by deploying SAR to real-world CARLA-based
autonomous driving with natural distractions. Finally, we provide the analysis
results of generalization drawn from the generalization decay and t-SNE
visualization. Code and demo videos are available at
https://github.com/DMU-XMU/SAR.git.
Related papers
- Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions [14.274653873720334]
We propose a simple yet effective auxiliary task to facilitate representation learning in distracting environments.
Under the assumption that task-relevant components of image observations are straightforward to identify with prior knowledge, we use a segmentation mask on image observations to only task-relevant components.
In modified DeepMind Control suite (DMC) and Meta-World tasks with added visual distractions, SD achieves significantly better sample efficiency and greater final performance than prior work.
arXiv Detail & Related papers (2024-10-13T19:24:07Z) - DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction [4.813546138483559]
Reinforcement Learning (RL) algorithms can learn robotic control tasks from visual observations, but they often require a large amount of data.
In this paper, we explore how the agent's knowledge of its shape can improve the sample efficiency of visual RL methods.
We propose a novel method, Disentangled Environment and Agent Representations, that uses the segmentation mask of the agent as supervision.
arXiv Detail & Related papers (2024-06-30T09:15:21Z) - Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning [73.53576440536682]
We introduce TACO: Temporal Action-driven Contrastive Learning, a powerful temporal contrastive learning approach.
TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states.
For online RL, TACO achieves 40% performance boost after one million environment interaction steps.
arXiv Detail & Related papers (2023-06-22T22:21:53Z) - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL)
Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - Generalization in Visual Reinforcement Learning with the Reward Sequence
Distribution [98.67737684075587]
Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL)
We propose the reward sequence distribution conditioned on the starting observation and the predefined subsequent action sequence (RSD-OA)
Experiments demonstrate that our representation learning approach based on RSD-OA significantly improves the generalization performance on unseen environments.
arXiv Detail & Related papers (2023-02-19T15:47:24Z) - Learning Task-relevant Representations for Generalization via
Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning.
We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information.
Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.