Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization
- URL: http://arxiv.org/abs/2411.01168v1
- Date: Sat, 02 Nov 2024 07:38:02 GMT
- Title: Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization
- Authors: Shengchao Hu, Wanru Zhao, Weixiong Lin, Li Shen, Ya Zhang, Dacheng Tao,
- Abstract summary: We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
- Score: 55.14484317645865
- License:
- Abstract: Offline reinforcement learning (RL) methods harness previous experiences to derive an optimal policy, forming the foundation for pre-trained large-scale models (PLMs). When encountering tasks not seen before, PLMs often utilize several expert trajectories as prompts to expedite their adaptation to new requirements. Though a range of prompt-tuning methods have been proposed to enhance the quality of prompts, these methods often face optimization restrictions due to prompt initialization, which can significantly constrain the exploration domain and potentially lead to suboptimal solutions. To eliminate the reliance on the initial prompt, we shift our perspective towards the generative model, framing the prompt-tuning process as a form of conditional generative modeling, where prompts are generated from random noise. Our innovation, the Prompt Diffuser, leverages a conditional diffusion model to produce prompts of exceptional quality. Central to our framework is the approach to trajectory reconstruction and the meticulous integration of downstream task guidance during the training phase. Further experimental results underscore the potency of the Prompt Diffuser as a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
Related papers
- Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.
Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.
To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models [10.969811500333755]
We introduce a Fine-tuning Initial Noise Distribution (FIND) framework with policy optimization.
Our method achieves 10 times faster than the SOTA approach.
arXiv Detail & Related papers (2024-07-28T10:07:55Z) - Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL [29.01858866450715]
We present RLPrompt, which aims to find optimal prompt tokens leveraging soft Q-learning.
While the results show promise, we have observed that the prompts frequently appear unnatural, which impedes their interpretability.
We address this limitation by using sparse Tsallis entropy regularization, a principled approach to filtering out unlikely tokens from consideration.
arXiv Detail & Related papers (2024-07-20T03:10:19Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Dreamguider: Improved Training free Diffusion-based Conditional Generation [31.68823843900196]
Dreamguider is a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network.
We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules.
arXiv Detail & Related papers (2024-06-04T17:59:32Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information.
Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction.
Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z) - Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework.
It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way.
GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.