Related papers: Prompting Decision Transformer for Few-Shot Policy Generalization

Prompting Decision Transformer for Few-Shot Policy Generalization

URL: http://arxiv.org/abs/2206.13499v1
Date: Mon, 27 Jun 2022 17:59:17 GMT
Title: Prompting Decision Transformer for Few-Shot Policy Generalization
Authors: Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
Abstract summary: We propose a Prompt-based Decision Transformer (Prompt-DT) to achieve few-shot adaptation in offline RL. Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks.
Score: 98.0914217850999
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

Related papers

IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting [26.933544407933034]
We tackle the challenge of optimizing prompt designs for diverse tasks in Multi-Domain Class-Incremental Learning (MCIL) Our Instance-Aware Gated Prompting (IA-GP) module enhances adaptation to new tasks while mitigating forgetting. Our Instance-Aware Class-Distribution-Driven Prompting (IA-CDDP) improves the task adaptation process by determining an accurate task-label-related confidence score for each instance.
arXiv Detail & Related papers (2025-03-26T14:59:23Z)
Towards bandit-based prompt-tuning for in-the-wild foundation agents [2.6731152954002924]
We propose an inference time bandit-based prompt-tuning framework to enhance task performance. Our experiments indicate not only clear performance gains due to bandit-based prompt-tuning, but also better sample complexity, scalability, and prompt space exploration.
arXiv Detail & Related papers (2025-02-10T11:20:10Z)
Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits [2.6731152954002924]
We introduce a scalable bandit-based prompt-tuning method that learns to construct high-performance trajectory prompts. Our approach significantly enhances downstream task performance without modifying the pre-trained Transformer backbone.
arXiv Detail & Related papers (2025-02-07T14:57:17Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer [10.338170161831496]
Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks. We introduce the Language model-d Prompt Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA) Our approach integrates pre-trained language model and RL tasks seamlessly.
arXiv Detail & Related papers (2024-08-02T17:25:34Z)
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer [76.39111896665585]
Incremental Learning (IL) aims to learn deep models on sequential tasks continually. Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples.
arXiv Detail & Related papers (2024-07-04T10:37:58Z)
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models [47.162575147632396]
Transferable Visual Prompting (TVP) is a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model. We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts.
arXiv Detail & Related papers (2024-04-17T09:39:07Z)
P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer [39.16560969128012]
Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model. We propose a novel solution - the Progressive Prompt Decision Transformer (P2DT) This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies.
arXiv Detail & Related papers (2024-01-22T02:58:53Z)
Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction. Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z)
Dynamic Prompting: A Unified Framework for Prompt Tuning [33.175097465669374]
We present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances. Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks. We establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios.
arXiv Detail & Related papers (2023-03-06T06:04:46Z)
TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA) In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.