Prompting Decision Transformer for Few-Shot Policy Generalization
- URL: http://arxiv.org/abs/2206.13499v1
- Date: Mon, 27 Jun 2022 17:59:17 GMT
- Title: Prompting Decision Transformer for Few-Shot Policy Generalization
- Authors: Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B.
Tenenbaum, Chuang Gan
- Abstract summary: We propose a Prompt-based Decision Transformer (Prompt-DT) to achieve few-shot adaptation in offline RL.
Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks.
- Score: 98.0914217850999
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Humans can leverage prior experience and learn novel tasks from a handful of
demonstrations. In contrast to offline meta-reinforcement learning, which aims
to achieve quick adaptation through better algorithm design, we investigate the
effect of architecture inductive bias on the few-shot learning capability. We
propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the
sequential modeling ability of the Transformer architecture and the prompt
framework to achieve few-shot adaptation in offline RL. We design the
trajectory prompt, which contains segments of the few-shot demonstrations, and
encodes task-specific information to guide policy generation. Our experiments
in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot
learner without any extra finetuning on unseen target tasks. Prompt-DT
outperforms its variants and strong meta offline RL baselines by a large margin
with a trajectory prompt containing only a few timesteps. Prompt-DT is also
robust to prompt length changes and can generalize to out-of-distribution (OOD)
environments.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer [10.338170161831496]
Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks.
We introduce the Language model-d Prompt Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA)
Our approach integrates pre-trained language model and RL tasks seamlessly.
arXiv Detail & Related papers (2024-08-02T17:25:34Z) - PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer [76.39111896665585]
Incremental Learning (IL) aims to learn deep models on sequential tasks continually.
Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples.
arXiv Detail & Related papers (2024-07-04T10:37:58Z) - Exploring the Transferability of Visual Prompting for Multimodal Large Language Models [47.162575147632396]
Transferable Visual Prompting (TVP) is a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model.
We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts.
arXiv Detail & Related papers (2024-04-17T09:39:07Z) - P2DT: Mitigating Forgetting in task-incremental Learning with
progressive prompt Decision Transformer [39.16560969128012]
Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model.
We propose a novel solution - the Progressive Prompt Decision Transformer (P2DT)
This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies.
arXiv Detail & Related papers (2024-01-22T02:58:53Z) - Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information.
Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction.
Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z) - Dynamic Prompting: A Unified Framework for Prompt Tuning [33.175097465669374]
We present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances.
Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks.
We establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios.
arXiv Detail & Related papers (2023-03-06T06:04:46Z) - TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA)
In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge.
Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.