TEMPERA: Test-Time Prompting via Reinforcement Learning
- URL: http://arxiv.org/abs/2211.11890v1
- Date: Mon, 21 Nov 2022 22:38:20 GMT
- Title: TEMPERA: Test-Time Prompting via Reinforcement Learning
- Authors: Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E.
Gonzalez
- Abstract summary: We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA)
In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge.
Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
- Score: 57.48657629588436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Careful prompt design is critical to the use of large language models in
zero-shot or few-shot learning. As a consequence, there is a growing interest
in automated methods to design optimal prompts. In this work, we propose
Test-time Prompt Editing using Reinforcement learning (TEMPERA). In contrast to
prior prompt generation methods, TEMPERA can efficiently leverage prior
knowledge, is adaptive to different queries and provides an interpretable
prompt for every query. To achieve this, we design a novel action space that
allows flexible editing of the initial prompts covering a wide set of
commonly-used components like instructions, few-shot exemplars, and
verbalizers. The proposed method achieves significant gains compared with
recent SoTA approaches like prompt tuning, AutoPrompt, and RLPrompt, across a
variety of tasks including sentiment analysis, topic classification, natural
language inference, and reading comprehension. Our method achieves 5.33x on
average improvement in sample efficiency when compared to the traditional
fine-tuning methods.
Related papers
- Large Language Models Prompting With Episodic Memory [53.8690170372303]
We propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities.
In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory.
Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks.
arXiv Detail & Related papers (2024-08-14T11:19:28Z) - Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks.
This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs.
We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z) - PRE: Vision-Language Prompt Learning with Reparameterization Encoder [24.855142164168605]
Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks.
To attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions.
To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens.
arXiv Detail & Related papers (2023-09-14T14:48:01Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - ConsPrompt: Exploiting Contrastive Samples for Fewshot Prompt Learning [37.219617741198334]
We explore utilizing suitable contrastive samples and multi-degree contrastive learning methods to improve the robustness of the prompt representation.
Our results exhibit state-of-the-art performance in different few-shot settings, and the ablation experiments also certify the effectiveness of utilizing multi-degree contrastive learning in the prompt-based fine-tuning process.
arXiv Detail & Related papers (2022-11-08T09:29:45Z) - IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.
We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z) - Making Pre-trained Language Models End-to-end Few-shot Learners with
Contrastive Prompt Tuning [41.15017636192417]
We present CP-Tuning, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning Language Models.
It is integrated with the task-invariant continuous prompt encoding technique with fully trainable prompt parameters.
Experiments over a variety of language understanding tasks used in IR systems and different PLMs show that CP-Tuning outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-04-01T02:24:24Z) - Instance-aware Prompt Learning for Language Understanding and Generation [49.22899822734549]
We propose an instance-aware prompt learning method that learns a different prompt for each instance.
Our method achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.
arXiv Detail & Related papers (2022-01-18T17:03:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.