Future-conditioned Unsupervised Pretraining for Decision Transformer
- URL: http://arxiv.org/abs/2305.16683v1
- Date: Fri, 26 May 2023 07:05:08 GMT
- Title: Future-conditioned Unsupervised Pretraining for Decision Transformer
- Authors: Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Wei Yang, Shuai Li
- Abstract summary: We propose Pretrained Decision Transformer (PDT) as a conceptually simple approach for unsupervised RL pretraining.
PDT leverages future trajectory information as a privileged context to predict actions during training.
It can extract diverse behaviors from offline data and controllably sample high-return behaviors by online finetuning.
- Score: 19.880628629512504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research in offline reinforcement learning (RL) has demonstrated that
return-conditioned supervised learning is a powerful paradigm for
decision-making problems. While promising, return conditioning is limited to
training data labeled with rewards and therefore faces challenges in learning
from unsupervised data. In this work, we aim to utilize generalized future
conditioning to enable efficient unsupervised pretraining from reward-free and
sub-optimal offline data. We propose Pretrained Decision Transformer (PDT), a
conceptually simple approach for unsupervised RL pretraining. PDT leverages
future trajectory information as a privileged context to predict actions during
training. The ability to make decisions based on both present and future
factors enhances PDT's capability for generalization. Besides, this feature can
be easily incorporated into a return-conditioned framework for online
finetuning, by assigning return values to possible futures and sampling future
embeddings based on their respective values. Empirically, PDT outperforms or
performs on par with its supervised pretraining counterpart, especially when
dealing with sub-optimal data. Further analysis reveals that PDT can extract
diverse behaviors from offline data and controllably sample high-return
behaviors by online finetuning. Code is available at here.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Strategies for Pretraining Neural Operators [5.812284760539713]
Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance.
We compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics.
We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best.
arXiv Detail & Related papers (2024-06-12T17:56:46Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining [25.669038513039357]
This paper provides a theoretical framework that analyzes supervised pretraining for in-context reinforcement learning.
We show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms.
arXiv Detail & Related papers (2023-10-12T17:55:02Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Q-learning Decision Transformer: Leveraging Dynamic Programming for
Conditional Sequence Modelling in Offline RL [0.0]
Decision Transformer (DT) combines the conditional policy approach and a transformer architecture.
DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy.
We propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT.
arXiv Detail & Related papers (2022-09-08T18:26:39Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Offline-to-Online Reinforcement Learning via Balanced Replay and
Pessimistic Q-Ensemble [135.6115462399788]
Deep offline reinforcement learning has made it possible to train strong robotic agents from offline datasets.
State-action distribution shift may lead to severe bootstrap error during fine-tuning.
We propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples.
arXiv Detail & Related papers (2021-07-01T16:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.