Related papers: Scaled Prompt-Tuning for Few-Shot Natural Language Generation

Scaled Prompt-Tuning for Few-Shot Natural Language Generation

URL: http://arxiv.org/abs/2309.06759v1
Date: Wed, 13 Sep 2023 07:12:31 GMT
Title: Scaled Prompt-Tuning for Few-Shot Natural Language Generation
Authors: Ting Hu, Christoph Meinel, Haojin Yang
Abstract summary: Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities. Memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible. We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability.
Score: 9.399840807973545
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasingly Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities, while the memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible. Besides, fine-tuning generally requires a certain amount of data from individual tasks whilst data collection cost is another issue to consider in real-world applications. In this work, we focus on Parameter-Efficient Fine-Tuning (PEFT) methods for few-shot Natural Language Generation (NLG), which freeze most parameters in LLMs and tune a small subset of parameters in few-shot cases so that memory footprint, training cost, and labeling cost are reduced while maintaining or even improving the performance. We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability but without an obvious increase in training cost. Further study on intermediate SPT suggests the superior transferability of SPT in few-shot scenarios, providing a recipe for data-deficient and computation-limited circumstances. Moreover, a comprehensive comparison of existing PEFT methods reveals that certain approaches exhibiting decent performance with modest training cost such as Prefix-Tuning in prior study could struggle in few-shot NLG tasks, especially on challenging datasets.

Related papers

Cost-Optimal Grouped-Query Attention for Long-Context LLMs [64.90662568387683]
Building effective Transformer-based large language models (LLMs) has recently become a research focus. We compare models with different parameter sizes, context lengths, and attention head configurations in terms of model performance, computational cost, and memory cost. Our studies show that, when processing sufficiently long sequences, a larger model with fewer attention heads can achieve a lower loss while incurring lower computational and memory costs.
arXiv Detail & Related papers (2025-03-12T17:50:42Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency. UPFT removes the need for labeled data or exhaustive sampling. Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models [18.877891285367216]
A class of parameter-efficient fine-tuning (PEFT) aims to mitigate computational challenges by selectively fine-tuning only a small fraction of the model parameters. We introduce $textID3$, a novel selective PEFT method that calculates parameter importance continually and dynamically unmasks parameters. We analytically show that $textID3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency.
arXiv Detail & Related papers (2024-08-26T17:58:53Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Leveraging Zero-Shot Prompting for Efficient Language Model Distillation [3.4205390087622582]
This paper introduces a novel approach for efficiently distilling LLMs into smaller, application-specific models. It utilizes LLMs' reasoning capabilities to generate labels and natural language rationales for unlabeled data. Key contributions include the employment of zero-shot prompting to elicit teacher model rationales.
arXiv Detail & Related papers (2024-03-23T16:51:52Z)
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool. We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z)
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment [12.674032145667763]
We present a comprehensive and systematic review of Efficient Fine-Tuning (PEFT) methods for pretrained language models (PLMs) PEFT offers an effective solution by reducing the number of fine-tuning parameters and memory usage while achieving comparable performance to full fine-tuning. We conduct experiments using several representative PEFT methods to better understand their effectiveness in parameter efficiency and memory efficiency.
arXiv Detail & Related papers (2023-12-19T13:31:24Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts. IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z)
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a. sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training. Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.