Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.12542v1
- Date: Sat, 25 Jun 2022 03:02:25 GMT
- Title: Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning
- Authors: Yang Yue, Bingyi Kang, Zhongwen Xu, Gao Huang, Shuicheng Yan
- Abstract summary: We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
- Score: 105.70602423944148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (RL) algorithms suffer severe performance
degradation when the interaction data is scarce, which limits their real-world
application. Recently, visual representation learning has been shown to be
effective and promising for boosting sample efficiency in RL. These methods
usually rely on contrastive learning and data augmentation to train a
transition model for state prediction, which is different from how the model is
used in RL--performing value-based planning. Accordingly, the learned model may
not be able to align well with the environment and generate consistent value
predictions, especially when the state transition is not deterministic. To
address this issue, we propose a novel method, called value-consistent
representation learning (VCR), to learn representations that are directly
related to decision-making. More specifically, VCR trains a model to predict
the future state (also referred to as the ''imagined state'') based on the
current one and a sequence of actions. Instead of aligning this imagined state
with a real state returned by the environment, VCR applies a $Q$-value head on
both states and obtains two distributions of action values. Then a distance is
computed and minimized to force the imagined state to produce a similar action
value prediction as that by the real state. We develop two implementations of
the above idea for the discrete and continuous action spaces respectively. We
conduct experiments on Atari 100K and DeepMind Control Suite benchmarks to
validate their effectiveness for improving sample efficiency. It has been
demonstrated that our methods achieve new state-of-the-art performance for
search-free RL algorithms.
Related papers
- MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning [8.61492882526007]
In visual Reinforcement Learning (RL), learning from pixel-based observations poses significant challenges on sample efficiency.
We introduce MOOSS, a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking.
Our evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency.
arXiv Detail & Related papers (2024-09-02T18:57:53Z) - DualView: Data Attribution from the Dual Perspective [16.083769847895336]
We present DualView, a novel method for post-hoc data attribution based on surrogate modelling.
We find that DualView requires considerably lower computational resources than other methods, while demonstrating comparable performance across evaluation metrics.
arXiv Detail & Related papers (2024-02-19T13:13:16Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Model Predictive Control with Self-supervised Representation Learning [13.225264876433528]
We propose the use of a reconstruction function within the TD-MPC framework, so that the agent can reconstruct the original observation.
Our proposed addition of another loss term leads to improved performance on both state- and image-based tasks.
arXiv Detail & Related papers (2023-04-14T16:02:04Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Knowing the Past to Predict the Future: Reinforcement Virtual Learning [29.47688292868217]
Reinforcement Learning (RL)-based control system has received considerable attention in recent decades.
In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space.
The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions.
arXiv Detail & Related papers (2022-11-02T16:48:14Z) - Visual processing in context of reinforcement learning [0.0]
This thesis introduces three different representation learning algorithms that have access to different subsets of the data sources that traditional RL algorithms use.
We conclude that including unsupervised representation learning in RL problem-solving pipelines can speed up learning.
arXiv Detail & Related papers (2022-08-26T09:30:51Z) - PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning [84.30765628008207]
We propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.
arXiv Detail & Related papers (2021-06-08T07:37:37Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.