Does Zero-Shot Reinforcement Learning Exist?
- URL: http://arxiv.org/abs/2209.14935v1
- Date: Thu, 29 Sep 2022 16:54:05 GMT
- Title: Does Zero-Shot Reinforcement Learning Exist?
- Authors: Ahmed Touati, J\'er\'emy Rapin, Yann Ollivier
- Abstract summary: A zero-shot RL agent is an agent that can solve any RL task instantly with no additional planning or learning.
This marks a shift from the reward-centric RL paradigm towards "controllable" agents.
Strategies for approximate zero-shot RL ave been suggested using successor features (SFs) or forward-backward (FB) representations.
- Score: 11.741744003560095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A zero-shot RL agent is an agent that can solve any RL task in a given
environment, instantly with no additional planning or learning, after an
initial reward-free learning phase. This marks a shift from the reward-centric
RL paradigm towards "controllable" agents that can follow arbitrary
instructions in an environment. Current RL agents can solve families of related
tasks at best, or require planning anew for each task. Strategies for
approximate zero-shot RL ave been suggested using successor features (SFs)
[BBQ+ 18] or forward-backward (FB) representations [TO21], but testing has been
limited.
After clarifying the relationships between these schemes, we introduce
improved losses and new SF models, and test the viability of zero-shot RL
schemes systematically on tasks from the Unsupervised RL benchmark [LYL+21]. To
disentangle universal representation learning from exploration, we work in an
offline setting and repeat the tests on several existing replay buffers.
SFs appear to suffer from the choice of the elementary state features. SFs
with Laplacian eigenfunctions do well, while SFs based on auto-encoders,
inverse curiosity, transition models, low-rank transition matrix, contrastive
learning, or diversity (APS), perform unconsistently. In contrast, FB
representations jointly learn the elementary and successor features from a
single, principled criterion. They perform best and consistently across the
board, reaching 85% of supervised RL performance with a good replay buffer, in
a zero-shot manner.
Related papers
- Knowledge Graph Reasoning with Self-supervised Reinforcement Learning [30.359557545737747]
We propose a self-supervised pre-training method to warm up the policy network before the RL training stage.
In our supervised learning stage, the agent selects actions based on the policy network and learns from generated labels.
We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics.
arXiv Detail & Related papers (2024-05-22T13:39:33Z) - Unsupervised Zero-Shot Reinforcement Learning via Functional Reward
Encodings [107.1837163643886]
We present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem.
Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples.
We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks.
arXiv Detail & Related papers (2024-02-27T01:59:02Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Continuous Coordination As a Realistic Scenario for Lifelong Learning [6.044372319762058]
We introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings.
We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation.
We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
arXiv Detail & Related papers (2021-03-04T18:44:03Z) - Learning to Prune Deep Neural Networks via Reinforcement Learning [64.85939668308966]
PuRL is a deep reinforcement learning based algorithm for pruning neural networks.
It achieves sparsity and accuracy comparable to current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-09T13:06:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.