In-context Reinforcement Learning with Algorithm Distillation
- URL: http://arxiv.org/abs/2210.14215v1
- Date: Tue, 25 Oct 2022 17:57:49 GMT
- Title: In-context Reinforcement Learning with Algorithm Distillation
- Authors: Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen
Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan
Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih
- Abstract summary: We propose a method for distilling reinforcement learning algorithms into neural networks by modeling their training histories with a causal sequence model.
A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context.
We demonstrate that AD can learn in-context in a variety of environments with sparse rewards, task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
- Score: 35.72003039559626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Algorithm Distillation (AD), a method for distilling reinforcement
learning (RL) algorithms into neural networks by modeling their training
histories with a causal sequence model. Algorithm Distillation treats learning
to reinforcement learn as an across-episode sequential prediction problem. A
dataset of learning histories is generated by a source RL algorithm, and then a
causal transformer is trained by autoregressively predicting actions given
their preceding learning histories as context. Unlike sequential policy
prediction architectures that distill post-learning or expert sequences, AD is
able to improve its policy entirely in-context without updating its network
parameters. We demonstrate that AD can reinforcement learn in-context in a
variety of environments with sparse rewards, combinatorial task structure, and
pixel-based observations, and find that AD learns a more data-efficient RL
algorithm than the one that generated the source data.
Related papers
- Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy.
We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch.
Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Reinforcement Learning for Topic Models [3.42658286826597]
We apply reinforcement learning techniques to topic modeling by replacing the variational autoencoder in ProdLDA with a continuous action space reinforcement learning policy.
We introduce several modifications: modernize the neural network architecture, weight the ELBO loss, use contextual embeddings, and monitor the learning process via computing topic diversity and coherence.
arXiv Detail & Related papers (2023-05-08T16:41:08Z) - KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
with Reinforced Keywords Learning [25.421649004269373]
In task-oriented dialogs (TOD), reinforcement learning algorithms train a model to directly optimize response for task-related metrics.
We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting.
Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-11-30T06:27:46Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Self-Paced Deep Reinforcement Learning [42.467323141301826]
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design.
We propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms.
arXiv Detail & Related papers (2020-04-24T15:48:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.