The Power of Scale for Parameter-Efficient Prompt Tuning
- URL: http://arxiv.org/abs/2104.08691v1
- Date: Sun, 18 Apr 2021 03:19:26 GMT
- Title: The Power of Scale for Parameter-Efficient Prompt Tuning
- Authors: Brian Lester, Rami Al-Rfou, Noah Constant
- Abstract summary: "prompt tuning" is a simple mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks.
Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin.
- Score: 4.481348281462904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we explore "prompt tuning", a simple yet effective mechanism
for learning "soft prompts" to condition frozen language models to perform
specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft
prompts are learned through backpropagation and can be tuned to incorporate
signal from any number of labeled examples. Our end-to-end learned approach
outperforms GPT-3's "few-shot" learning by a large margin. More remarkably,
through ablations on model size using T5, we show that prompt tuning becomes
more competitive with scale: as models exceed billions of parameters, our
method "closes the gap" and matches the strong performance of model tuning
(where all model weights are tuned). This finding is especially relevant in
that large models are costly to share and serve, and the ability to reuse one
frozen model for multiple downstream tasks can ease this burden. Our method can
be seen as a simplification of the recently proposed "prefix tuning" of Li and
Liang (2021), and we provide a comparison to this and other similar approaches.
Finally, we show that conditioning a frozen model with soft prompts confers
benefits in robustness to domain transfer, as compared to full model tuning.
Related papers
- Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - XPrompt: Exploring the Extreme of Prompt Tuning [31.242680485717447]
We propose a novel Prompt tuning model with an eXtremely small scale (XPrompt) under the regime of lottery tickets hypothesis.
XPrompt eliminates the negative prompt tokens at different levels through a hierarchical structured pruning, yielding a more parameter-efficient prompt yet with a competitive performance.
arXiv Detail & Related papers (2022-10-10T06:57:19Z) - Prompt Tuning for Generative Multimodal Pretrained Models [75.44457974275154]
We implement prompt tuning on the unified sequence-to-sequence pretrained model adaptive to both understanding and generation tasks.
Experimental results demonstrate that the light-weight prompt tuning can achieve comparable performance with finetuning.
In comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks.
arXiv Detail & Related papers (2022-08-04T08:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.