Related papers: Supervised Pretraining Can Learn In-Context Reinforcement Learning

Supervised Pretraining Can Learn In-Context Reinforcement Learning

URL: http://arxiv.org/abs/2306.14892v1
Date: Mon, 26 Jun 2023 17:58:50 GMT
Title: Supervised Pretraining Can Learn In-Context Reinforcement Learning
Authors: Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill
Abstract summary: In this paper, we study the in-context learning capabilities of transformers in decision-making problems. We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
Score: 96.62869749926415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline, despite not being explicitly trained to do so. The model also generalizes beyond the pretraining distribution to new tasks and automatically adapts its decision-making strategies to unknown structure. Theoretically, we show DPT can be viewed as an efficient implementation of Bayesian posterior sampling, a provably sample-efficient RL algorithm. We further leverage this connection to provide guarantees on the regret of the in-context algorithm yielded by DPT, and prove that it can learn faster than algorithms used to generate the pretraining data. These results suggest a promising yet simple path towards instilling strong in-context decision-making abilities in transformers.

Related papers

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer [10.338170161831496]
Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks. We introduce the Language model-d Prompt Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA) Our approach integrates pre-trained language model and RL tasks seamlessly.
arXiv Detail & Related papers (2024-08-02T17:25:34Z)
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning [12.608461657195367]
We study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. We use a transformer as a decision-making algorithm to learn this shared structure so as to generalize to the test task. We show that our algorithm, without the knowledge of the underlying problem structure, can learn a near-optimal policy in-context.
arXiv Detail & Related papers (2024-06-07T16:34:31Z)
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning [17.714908233024847]
reinforcement learning (RL) agents learn to solve new tasks by updating their neural network parameters through interactions with the task environment. Recent works demonstrate that some RL agents, after certain pretraining procedures, can learn to solve unseen new tasks without parameter updates.
arXiv Detail & Related papers (2024-05-22T17:38:16Z)
Transformers for Supervised Online Continual Learning [11.270594318662233]
We propose a method that leverages transformers' in-context learning capabilities for online continual learning. Our method demonstrates significant improvements over previous state-of-the-art results on CLOC, a challenging large-scale real-world benchmark for image geo-localization.
arXiv Detail & Related papers (2024-03-03T16:12:20Z)
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining [25.669038513039357]
This paper provides a theoretical framework that analyzes supervised pretraining for in-context reinforcement learning. We show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms.
arXiv Detail & Related papers (2023-10-12T17:55:02Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Future-conditioned Unsupervised Pretraining for Decision Transformer [19.880628629512504]
We propose Pretrained Decision Transformer (PDT) as a conceptually simple approach for unsupervised RL pretraining. PDT leverages future trajectory information as a privileged context to predict actions during training. It can extract diverse behaviors from offline data and controllably sample high-return behaviors by online finetuning.
arXiv Detail & Related papers (2023-05-26T07:05:08Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems. It exploits the combination of reinforcement learning and latent variable generative models. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z)
RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive. They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.