Supervised Pretraining Can Learn In-Context Reinforcement Learning
- URL: http://arxiv.org/abs/2306.14892v1
- Date: Mon, 26 Jun 2023 17:58:50 GMT
- Title: Supervised Pretraining Can Learn In-Context Reinforcement Learning
- Authors: Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea
Finn, Ofir Nachum, Emma Brunskill
- Abstract summary: In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
- Score: 96.62869749926415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large transformer models trained on diverse datasets have shown a remarkable
ability to learn in-context, achieving high few-shot performance on tasks they
were not explicitly trained to solve. In this paper, we study the in-context
learning capabilities of transformers in decision-making problems, i.e.,
reinforcement learning (RL) for bandits and Markov decision processes. To do
so, we introduce and study Decision-Pretrained Transformer (DPT), a supervised
pretraining method where the transformer predicts an optimal action given a
query state and an in-context dataset of interactions, across a diverse set of
tasks. This procedure, while simple, produces a model with several surprising
capabilities. We find that the pretrained transformer can be used to solve a
range of RL problems in-context, exhibiting both exploration online and
conservatism offline, despite not being explicitly trained to do so. The model
also generalizes beyond the pretraining distribution to new tasks and
automatically adapts its decision-making strategies to unknown structure.
Theoretically, we show DPT can be viewed as an efficient implementation of
Bayesian posterior sampling, a provably sample-efficient RL algorithm. We
further leverage this connection to provide guarantees on the regret of the
in-context algorithm yielded by DPT, and prove that it can learn faster than
algorithms used to generate the pretraining data. These results suggest a
promising yet simple path towards instilling strong in-context decision-making
abilities in transformers.
Related papers
- Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer [10.338170161831496]
Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks.
We introduce the Language model-d Prompt Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA)
Our approach integrates pre-trained language model and RL tasks seamlessly.
arXiv Detail & Related papers (2024-08-02T17:25:34Z) - Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning [12.608461657195367]
We study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret.
We use a transformer as a decision-making algorithm to learn this shared structure so as to generalize to the test task.
We show that our algorithm, without the knowledge of the underlying problem structure, can learn a near-optimal policy in-context.
arXiv Detail & Related papers (2024-06-07T16:34:31Z) - Transformers for Supervised Online Continual Learning [11.270594318662233]
We propose a method that leverages transformers' in-context learning capabilities for online continual learning.
Our method demonstrates significant improvements over previous state-of-the-art results on CLOC, a challenging large-scale real-world benchmark for image geo-localization.
arXiv Detail & Related papers (2024-03-03T16:12:20Z) - Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining [25.669038513039357]
This paper provides a theoretical framework that analyzes supervised pretraining for in-context reinforcement learning.
We show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms.
arXiv Detail & Related papers (2023-10-12T17:55:02Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Future-conditioned Unsupervised Pretraining for Decision Transformer [19.880628629512504]
We propose Pretrained Decision Transformer (PDT) as a conceptually simple approach for unsupervised RL pretraining.
PDT leverages future trajectory information as a privileged context to predict actions during training.
It can extract diverse behaviors from offline data and controllably sample high-return behaviors by online finetuning.
arXiv Detail & Related papers (2023-05-26T07:05:08Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.