Scaled Prompt-Tuning for Few-Shot Natural Language Generation
- URL: http://arxiv.org/abs/2309.06759v1
- Date: Wed, 13 Sep 2023 07:12:31 GMT
- Title: Scaled Prompt-Tuning for Few-Shot Natural Language Generation
- Authors: Ting Hu, Christoph Meinel, Haojin Yang
- Abstract summary: Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities.
Memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible.
We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability.
- Score: 9.399840807973545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasingly Large Language Models (LLMs) demonstrate stronger language
understanding and generation capabilities, while the memory demand and
computation cost of fine-tuning LLMs on downstream tasks are non-negligible.
Besides, fine-tuning generally requires a certain amount of data from
individual tasks whilst data collection cost is another issue to consider in
real-world applications. In this work, we focus on Parameter-Efficient
Fine-Tuning (PEFT) methods for few-shot Natural Language Generation (NLG),
which freeze most parameters in LLMs and tune a small subset of parameters in
few-shot cases so that memory footprint, training cost, and labeling cost are
reduced while maintaining or even improving the performance. We propose a
Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better
performance and generalization ability but without an obvious increase in
training cost. Further study on intermediate SPT suggests the superior
transferability of SPT in few-shot scenarios, providing a recipe for
data-deficient and computation-limited circumstances. Moreover, a comprehensive
comparison of existing PEFT methods reveals that certain approaches exhibiting
decent performance with modest training cost such as Prefix-Tuning in prior
study could struggle in few-shot NLG tasks, especially on challenging datasets.
Related papers
- Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models [18.877891285367216]
A class of parameter-efficient fine-tuning (PEFT) aims to mitigate computational challenges by selectively fine-tuning only a small fraction of the model parameters.
We introduce $textID3$, a novel selective PEFT method that calculates parameter importance continually and dynamically unmasks parameters.
We analytically show that $textID3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency.
arXiv Detail & Related papers (2024-08-26T17:58:53Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Leveraging Zero-Shot Prompting for Efficient Language Model Distillation [3.4205390087622582]
This paper introduces a novel approach for efficiently distilling LLMs into smaller, application-specific models.
It utilizes LLMs' reasoning capabilities to generate labels and natural language rationales for unlabeled data.
Key contributions include the employment of zero-shot prompting to elicit teacher model rationales.
arXiv Detail & Related papers (2024-03-23T16:51:52Z) - An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities.
Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool.
We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z) - Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models:
A Critical Review and Assessment [12.674032145667763]
We present a comprehensive and systematic review of Efficient Fine-Tuning (PEFT) methods for pretrained language models (PLMs)
PEFT offers an effective solution by reducing the number of fine-tuning parameters and memory usage while achieving comparable performance to full fine-tuning.
We conduct experiments using several representative PEFT methods to better understand their effectiveness in parameter efficiency and memory efficiency.
arXiv Detail & Related papers (2023-12-19T13:31:24Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.