Cross-Trajectory Representation Learning for Zero-Shot Generalization in
RL
- URL: http://arxiv.org/abs/2106.02193v1
- Date: Fri, 4 Jun 2021 00:43:10 GMT
- Title: Cross-Trajectory Representation Learning for Zero-Shot Generalization in
RL
- Authors: Bogdan Mazoure, Ahmed M. Ahmed, Patrick MacAlpine, R Devon Hjelm,
Andrey Kolobov
- Abstract summary: generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
Many promising approaches to this challenge consider RL as a process of training two functions simultaneously.
We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations.
- Score: 21.550201956884532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A highly desirable property of a reinforcement learning (RL) agent -- and a
major difficulty for deep RL approaches -- is the ability to generalize
policies learned on a few tasks over a high-dimensional observation space to
similar tasks not seen during training. Many promising approaches to this
challenge consider RL as a process of training two functions simultaneously: a
complex nonlinear encoder that maps high-dimensional observations to a latent
representation space, and a simple linear policy over this space. We posit that
a superior encoder for zero-shot generalization in RL can be trained by using
solely an auxiliary SSL objective if the training process encourages the
encoder to map behaviorally similar observations to similar representations, as
reward-based signal can cause overfitting in the encoder (Raileanu et al.,
2021). We propose Cross-Trajectory Representation Learning (CTRL), a method
that runs within an RL agent and conditions its encoder to recognize behavioral
similarity in observations by applying a novel SSL objective to pairs of
trajectories from the agent's policies. CTRL can be viewed as having the same
effect as inducing a pseudo-bisimulation metric but, crucially, avoids the use
of rewards and associated overfitting risks. Our experiments ablate various
components of CTRL and demonstrate that in combination with PPO it achieves
better generalization performance on the challenging Procgen benchmark suite
(Cobbe et al., 2020).
Related papers
- SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning [33.125187822259186]
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.
We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
arXiv Detail & Related papers (2023-11-03T16:19:33Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - Light-weight probing of unsupervised representations for Reinforcement Learning [20.638410483549706]
We study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation.
We show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark.
This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes.
arXiv Detail & Related papers (2022-08-25T21:08:01Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - Improving Zero-shot Generalization in Offline Reinforcement Learning
using Generalized Similarity Functions [34.843526573355746]
Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but exhibit difficulty in generalizing to scenarios not seen during training.
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior.
arXiv Detail & Related papers (2021-11-29T15:42:54Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Learning Temporally-Consistent Representations for Data-Efficient
Reinforcement Learning [3.308743964406687]
$k$-Step Latent (KSL) is a representation learning method that enforces temporal consistency of representations.
KSL produces encoders that generalize better to new tasks unseen during training.
arXiv Detail & Related papers (2021-10-11T00:16:43Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.