Prompt-Tuning Decision Transformer with Preference Ranking
- URL: http://arxiv.org/abs/2305.09648v1
- Date: Tue, 16 May 2023 17:49:04 GMT
- Title: Prompt-Tuning Decision Transformer with Preference Ranking
- Authors: Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao
- Abstract summary: We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information.
Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction.
Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
- Score: 83.76329715043205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt-tuning has emerged as a promising method for adapting pre-trained
models to downstream tasks or aligning with human preferences. Prompt learning
is widely used in NLP but has limited applicability to RL due to the complex
physical meaning and environment-specific information contained within RL
prompts. These factors require supervised learning to imitate the
demonstrations and may result in a loss of meaning after learning.
Additionally, directly extending prompt-tuning approaches to RL is challenging
because RL prompts guide agent behavior based on environmental modeling and
analysis, rather than filling in missing information, making it unlikely that
adjustments to the prompt format for downstream tasks, as in NLP, can yield
significant improvements. In this work, we propose the Prompt-Tuning DT
algorithm to address these challenges by using trajectory segments as prompts
to guide RL agents in acquiring environmental information and optimizing
prompts via black-box tuning to enhance their ability to contain more relevant
information, thereby enabling agents to make better decisions. Our approach
involves randomly sampling a Gaussian distribution to fine-tune the elements of
the prompt trajectory and using preference ranking function to find the
optimization direction, thereby providing more informative prompts and guiding
the agent towards specific preferences in the target environment. Extensive
experiments show that with only 0.03% of the parameters learned, Prompt-Tuning
DT achieves comparable or even better performance than full-model fine-tuning
in low-data scenarios. Our work contributes to the advancement of prompt-tuning
approaches in RL, providing a promising direction for optimizing large RL
agents for specific preference tasks.
Related papers
- Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs [27.014415210732103]
We introduce textbfLanguage textbfModel textbfGuided textbfTrade-offs (i.e., textbfLMGT), a novel, sample-efficient framework for Reinforcement Learning.
arXiv Detail & Related papers (2024-09-07T07:40:43Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning [0.0]
In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters.
In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed.
Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
arXiv Detail & Related papers (2021-12-15T13:10:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.