LiST: Lite Self-training Makes Efficient Few-shot Learners
- URL: http://arxiv.org/abs/2110.06274v1
- Date: Tue, 12 Oct 2021 18:47:18 GMT
- Title: LiST: Lite Self-training Makes Efficient Few-shot Learners
- Authors: Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed
Hassan Awadallah, Jianfeng Gao
- Abstract summary: LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
- Score: 91.28065455714018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new method LiST for efficient fine-tuning of large pre-trained
language models (PLMs) in few-shot learning settings. LiST significantly
improves over recent methods that adopt prompt fine-tuning using two key
techniques. The first one is the use of self-training to leverage large amounts
of unlabeled data for prompt-tuning to significantly boost the model
performance in few-shot settings. We use self-training in conjunction with
meta-learning for re-weighting noisy pseudo-prompt labels. However, traditional
self-training is expensive as it requires updating all the model parameters
repetitively. Therefore, we use a second technique for light-weight fine-tuning
where we introduce a small number of task-specific adapter parameters that are
fine-tuned during self-training while keeping the PLM encoder frozen. This also
significantly reduces the overall model footprint across several tasks that can
now share a common PLM encoder as backbone for inference. Combining the above
techniques, LiST not only improves the model performance for few-shot learning
on target domains but also reduces the model memory footprint. We present a
comprehensive study on six NLU tasks to validate the effectiveness of LiST. The
results show that LiST improves by 35% over classic fine-tuning methods and 6%
over prompt-tuning with 96% reduction in number of trainable parameters when
fine-tuned with no more than 30 labeled examples from each target domain.
Related papers
- Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping [53.454408491386886]
bootstrapping self-alignment markedly surpasses the single-round approach.
We propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance.
Based on easy-to-hard training recipe, we propose SOFT+ which further boost self-alignment's performance.
arXiv Detail & Related papers (2024-02-12T12:30:42Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models.
We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z) - Tuning Language Models as Training Data Generators for
Augmentation-Enhanced Few-Shot Learning [30.65315081964461]
We study few-shot learning with pretrained language models (PLMs) from a different perspective.
We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples.
Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods.
arXiv Detail & Related papers (2022-11-06T06:46:47Z) - LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
Learning [82.93130407930762]
It is costly to update the entire parameter set of large pre-trained models.
PETL techniques allow updating a small subset of parameters inside a pre-trained backbone network for a new task.
We propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts.
arXiv Detail & Related papers (2022-06-13T23:51:56Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - AttentionLite: Towards Efficient Self-Attention Models for Vision [9.957033392865982]
We propose a novel framework for producing a class of parameter and compute efficient models called AttentionLitesuitable for resource-constrained applications.
We can simultaneously distill knowledge from a compute-heavy teacher while also pruning the student model in a single pass of training.
arXiv Detail & Related papers (2020-12-21T17:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.