Accelerating Representation Learning with View-Consistent Dynamics in
Data-Efficient Reinforcement Learning
- URL: http://arxiv.org/abs/2201.07016v1
- Date: Tue, 18 Jan 2022 14:28:30 GMT
- Title: Accelerating Representation Learning with View-Consistent Dynamics in
Data-Efficient Reinforcement Learning
- Authors: Tao Huang, Jiachen Wang, Xiao Chen
- Abstract summary: We propose to accelerate state representation learning by enforcing view-consistency on the dynamics.
We introduce a formalism of Multi-view Markov Decision Process (MMDP) that incorporates multiple views of the state.
Following the structure of MMDP, our method, View-Consistent Dynamics (VCD), learns state representations by training a view-consistent dynamics model in the latent space.
- Score: 12.485293708638292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning informative representations from image-based observations is of
fundamental concern in deep Reinforcement Learning (RL). However,
data-inefficiency remains a significant barrier to this objective. To overcome
this obstacle, we propose to accelerate state representation learning by
enforcing view-consistency on the dynamics. Firstly, we introduce a formalism
of Multi-view Markov Decision Process (MMDP) that incorporates multiple views
of the state. Following the structure of MMDP, our method, View-Consistent
Dynamics (VCD), learns state representations by training a view-consistent
dynamics model in the latent space, where views are generated by applying data
augmentation to states. Empirical evaluation on DeepMind Control Suite and
Atari-100k demonstrates VCD to be the SoTA data-efficient algorithm on visual
control tasks.
Related papers
- Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs [9.662551514840388]
We introduce a Dynamical Variational Auto-Encoder (DVAE) designed to learn causal Markovian dynamics from offline trajectories.
Our method employs an extended hindsight framework that integrates past, current, and multi-step future information.
Empirical results reveal that this approach uncovers the causal graph governing hidden state transitions more effectively than history-based and typical hindsight-based models.
arXiv Detail & Related papers (2024-11-12T14:27:45Z) - MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning [8.61492882526007]
In visual Reinforcement Learning (RL), learning from pixel-based observations poses significant challenges on sample efficiency.
We introduce MOOSS, a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking.
Our evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency.
arXiv Detail & Related papers (2024-09-02T18:57:53Z) - Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications [0.21051221444478305]
How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications.
We propose an intrinsic dynamics-driven representation learning method with sequence models in visual reinforcement learning.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Learning Generalizable Representations for Reinforcement Learning via
Adaptive Meta-learner of Behavioral Similarities [43.327357653393015]
We propose a novel meta-learner-based framework for representation learning regarding behavioral similarities for reinforcement learning.
We empirically demonstrate that our proposed framework outperforms state-of-the-art baselines on several benchmarks.
arXiv Detail & Related papers (2022-12-26T11:11:23Z) - Challenges and Opportunities in Offline Reinforcement Learning from
Visual Observations [58.758928936316785]
offline reinforcement learning from visual observations with continuous action spaces remains under-explored.
We show that modifications to two popular vision-based online reinforcement learning algorithms suffice to outperform existing offline RL methods.
arXiv Detail & Related papers (2022-06-09T22:08:47Z) - Which Mutual-Information Representation Learning Objectives are
Sufficient for Control? [80.2534918595143]
Mutual information provides an appealing formalism for learning representations of data.
This paper formalizes the sufficiency of a state representation for learning and representing the optimal policy.
Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP.
arXiv Detail & Related papers (2021-06-14T10:12:34Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.