Improving Zero-shot Generalization in Offline Reinforcement Learning
using Generalized Similarity Functions
- URL: http://arxiv.org/abs/2111.14629v1
- Date: Mon, 29 Nov 2021 15:42:54 GMT
- Title: Improving Zero-shot Generalization in Offline Reinforcement Learning
using Generalized Similarity Functions
- Authors: Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson
- Abstract summary: Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but exhibit difficulty in generalizing to scenarios not seen during training.
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior.
- Score: 34.843526573355746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) agents are widely used for solving complex
sequential decision making tasks, but still exhibit difficulty in generalizing
to scenarios not seen during training. While prior online approaches
demonstrated that using additional signals beyond the reward function can lead
to better generalization capabilities in RL agents, i.e. using self-supervised
learning (SSL), they struggle in the offline RL setting, i.e. learning from a
static dataset. We show that performance of online algorithms for
generalization in RL can be hindered in the offline setting due to poor
estimation of similarity between observations. We propose a new
theoretically-motivated framework called Generalized Similarity Functions
(GSF), which uses contrastive learning to train an offline RL agent to
aggregate observations based on the similarity of their expected future
behavior, where we quantify this similarity using \emph{generalized value
functions}. We show that GSF is general enough to recover existing SSL
objectives while also improving zero-shot generalization performance on a
complex offline RL benchmark, offline Procgen.
Related papers
- Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning [17.236861687708096]
Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge.
Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes.
arXiv Detail & Related papers (2024-04-28T12:25:09Z) - Closing the Gap between TD Learning and Supervised Learning -- A
Generalisation Point of View [51.30152184507165]
Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training.
This oft-sought property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL)
It remains unclear whether those methods forgo this important stitching property.
arXiv Detail & Related papers (2024-01-20T14:23:25Z) - RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization [23.417092819516185]
We introduce RL-ViGen: a novel Reinforcement Learning Benchmark for Visual Generalization.
RL-ViGen contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions.
Our aspiration is that RL-ViGen will serve as a catalyst in the future creation of universal visual generalization RL agents.
arXiv Detail & Related papers (2023-07-15T05:45:37Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - Cross-Trajectory Representation Learning for Zero-Shot Generalization in
RL [21.550201956884532]
generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
Many promising approaches to this challenge consider RL as a process of training two functions simultaneously.
We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations.
arXiv Detail & Related papers (2021-06-04T00:43:10Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.