Related papers: The Power of Scale for Parameter-Efficient Prompt Tuning

The Power of Scale for Parameter-Efficient Prompt Tuning

URL: http://arxiv.org/abs/2104.08691v1
Date: Sun, 18 Apr 2021 03:19:26 GMT
Title: The Power of Scale for Parameter-Efficient Prompt Tuning
Authors: Brian Lester, Rami Al-Rfou, Noah Constant
Abstract summary: "prompt tuning" is a simple mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin.
Score: 4.481348281462904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

Related papers

Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading [0.5825410941577593]
Fine-tuning methods have historically required large-scale compute clusters inaccessible to most users.<n>New closed-model approaches such as OpenAI's fine-tuning service promise results with as few as 100 examples.<n>We evaluate both of these fine-tuning methods, measuring their interaction with few-shot prompting for automated short answer grading.
arXiv Detail & Related papers (2025-08-06T03:52:55Z)
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains. We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation. Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models. Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z)
Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks. We propose Sample-specific Ensemble of Source Models (SESoM) SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z)
XPrompt: Exploring the Extreme of Prompt Tuning [31.242680485717447]
We propose a novel Prompt tuning model with an eXtremely small scale (XPrompt) under the regime of lottery tickets hypothesis. XPrompt eliminates the negative prompt tokens at different levels through a hierarchical structured pruning, yielding a more parameter-efficient prompt yet with a competitive performance.
arXiv Detail & Related papers (2022-10-10T06:57:19Z)
Prompt Tuning for Generative Multimodal Pretrained Models [75.44457974275154]
We implement prompt tuning on the unified sequence-to-sequence pretrained model adaptive to both understanding and generation tasks. Experimental results demonstrate that the light-weight prompt tuning can achieve comparable performance with finetuning. In comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks.
arXiv Detail & Related papers (2022-08-04T08:56:38Z)
LiST: Lite Self-training Makes Efficient Few-shot Learners [91.28065455714018]
LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
arXiv Detail & Related papers (2021-10-12T18:47:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.