Contrastive Learning as Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2206.07568v1
- Date: Wed, 15 Jun 2022 14:34:15 GMT
- Title: Contrastive Learning as Goal-Conditioned Reinforcement Learning
- Authors: Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, Sergey Levine
- Abstract summary: In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
- Score: 147.28638631734486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In reinforcement learning (RL), it is easier to solve a task if given a good
representation. While deep RL should automatically acquire such good
representations, prior work often finds that learning representations in an
end-to-end fashion is unstable and instead equip RL algorithms with additional
representation learning parts (e.g., auxiliary losses, data augmentation). How
can we design RL algorithms that directly acquire good representations? In this
paper, instead of adding representation learning parts to an existing RL
algorithm, we show (contrastive) representation learning methods can be cast as
RL algorithms in their own right. To do this, we build upon prior work and
apply contrastive representation learning to action-labeled trajectories, in
such a way that the (inner product of) learned representations exactly
corresponds to a goal-conditioned value function. We use this idea to
reinterpret a prior RL method as performing contrastive learning, and then use
the idea to propose a much simpler method that achieves similar performance.
Across a range of goal-conditioned RL tasks, we demonstrate that contrastive RL
methods achieve higher success rates than prior non-contrastive methods,
including in the offline RL setting. We also show that contrastive RL
outperforms prior methods on image-based tasks, without using data augmentation
or auxiliary objectives.
Related papers
- Closing the Gap between TD Learning and Supervised Learning -- A
Generalisation Point of View [51.30152184507165]
Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training.
This oft-sought property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL)
It remains unclear whether those methods forgo this important stitching property.
arXiv Detail & Related papers (2024-01-20T14:23:25Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Light-weight probing of unsupervised representations for Reinforcement Learning [20.638410483549706]
We study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation.
We show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark.
This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes.
arXiv Detail & Related papers (2022-08-25T21:08:01Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.