Towards bandit-based prompt-tuning for in-the-wild foundation agents
- URL: http://arxiv.org/abs/2502.06358v2
- Date: Tue, 11 Feb 2025 10:54:40 GMT
- Title: Towards bandit-based prompt-tuning for in-the-wild foundation agents
- Authors: Finn Rietz, Oleg Smirnov, Sara Karimi, Lele Cao,
- Abstract summary: We propose an inference time bandit-based prompt-tuning framework to enhance task performance.
Our experiments indicate not only clear performance gains due to bandit-based prompt-tuning, but also better sample complexity, scalability, and prompt space exploration.
- Score: 2.6731152954002924
- License:
- Abstract: Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline reinforcement learning pre-training by leveraging stochastic trajectory prompts to identify the target task. However, these prompts are sampled uniformly from expert demonstrations, overlooking a critical limitation: Not all prompts are equally informative for differentiating between tasks. To address this, we propose an inference time bandit-based prompt-tuning framework that explores and optimizes trajectory prompt selection to enhance task performance. Our experiments indicate not only clear performance gains due to bandit-based prompt-tuning, but also better sample complexity, scalability, and prompt space exploration compared to prompt-tuning baselines.
Related papers
- Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits [2.6731152954002924]
We introduce a scalable bandit-based prompt-tuning method that learns to construct high-performance trajectory prompts.
Our approach significantly enhances downstream task performance without modifying the pre-trained Transformer backbone.
arXiv Detail & Related papers (2025-02-07T14:57:17Z) - Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL [29.01858866450715]
We present RLPrompt, which aims to find optimal prompt tokens leveraging soft Q-learning.
While the results show promise, we have observed that the prompts frequently appear unnatural, which impedes their interpretability.
We address this limitation by using sparse Tsallis entropy regularization, a principled approach to filtering out unlikely tokens from consideration.
arXiv Detail & Related papers (2024-07-20T03:10:19Z) - Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning [44.43258626098661]
We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks.
We propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks.
We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings.
arXiv Detail & Related papers (2024-02-13T16:57:02Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Prompting Decision Transformer for Few-Shot Policy Generalization [98.0914217850999]
We propose a Prompt-based Decision Transformer (Prompt-DT) to achieve few-shot adaptation in offline RL.
Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks.
arXiv Detail & Related papers (2022-06-27T17:59:17Z) - On Transferability of Prompt Tuning for Natural Language Understanding [63.29235426932978]
We investigate the transferability of soft prompts across different tasks and models.
We find that trained soft prompts can well transfer to similar tasks and initialize PT for them to accelerate training and improve performance.
Our findings show that improving PT with knowledge transfer is possible and promising, while prompts' cross-task transferability is generally better than the cross-model transferability.
arXiv Detail & Related papers (2021-11-12T13:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.