Tuning Language Models as Training Data Generators for
Augmentation-Enhanced Few-Shot Learning
- URL: http://arxiv.org/abs/2211.03044v2
- Date: Fri, 12 May 2023 06:06:13 GMT
- Title: Tuning Language Models as Training Data Generators for
Augmentation-Enhanced Few-Shot Learning
- Authors: Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher,
Jiawei Han
- Abstract summary: We study few-shot learning with pretrained language models (PLMs) from a different perspective.
We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples.
Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods.
- Score: 30.65315081964461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have revealed the intriguing few-shot learning ability of
pretrained language models (PLMs): They can quickly adapt to a new task when
fine-tuned on a small amount of labeled data formulated as prompts, without
requiring abundant task-specific annotations. Despite their promising
performance, most existing few-shot approaches that only learn from the small
training set still underperform fully supervised training by nontrivial
margins. In this work, we study few-shot learning with PLMs from a different
perspective: We first tune an autoregressive PLM on the few-shot samples and
then use it as a generator to synthesize a large amount of novel training
samples which augment the original training set. To encourage the generator to
produce label-discriminative samples, we train it via weighted maximum
likelihood where the weight of each token is automatically adjusted based on a
discriminative meta-learning objective. A classification PLM can then be
fine-tuned on both the few-shot and the synthetic samples with regularization
for better generalization and stability. Our approach FewGen achieves an
overall better result across seven classification tasks of the GLUE benchmark
than existing few-shot learning methods, improving no-augmentation methods by
5+ average points, and outperforming augmentation methods by 3+ average points.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Learning New Tasks from a Few Examples with Soft-Label Prototypes [18.363177410917597]
We propose a novel few-shot learning approach based on soft-label prototypes (SLPs)
We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class.
We experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting.
arXiv Detail & Related papers (2022-10-31T16:06:48Z) - Gradient-Based Meta-Learning Using Uncertainty to Weigh Loss for
Few-Shot Learning [5.691930884128995]
Model-Agnostic Meta-Learning (MAML) is one of the most successful meta-learning techniques for few-shot learning.
New method is proposed for task-specific learner adaptively learn to select parameters that minimize the loss of new tasks.
Method 1 generates weights by comparing meta-loss differences to improve the accuracy when there are few classes.
Method 2 introduces the homoscedastic uncertainty of each task to weigh multiple losses based on the original gradient descent.
arXiv Detail & Related papers (2022-08-17T08:11:51Z) - ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient
Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation.
We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z) - Generating Training Data with Language Models: Towards Zero-Shot
Language Understanding [35.92571138322246]
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks.
We present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks.
Our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark.
arXiv Detail & Related papers (2022-02-09T16:02:18Z) - LiST: Lite Self-training Makes Efficient Few-shot Learners [91.28065455714018]
LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
arXiv Detail & Related papers (2021-10-12T18:47:18Z) - Few Is Enough: Task-Augmented Active Meta-Learning for Brain Cell
Classification [8.998976678920236]
We propose a tAsk-auGmented actIve meta-LEarning (AGILE) method to efficiently adapt Deep Neural Networks to new tasks.
AGILE combines a meta-learning algorithm with a novel task augmentation technique which we use to generate an initial adaptive model.
We show that the proposed task-augmented meta-learning framework can learn to classify new cell types after a single gradient step.
arXiv Detail & Related papers (2020-07-09T18:03:12Z) - To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on
Resource Rich Tasks [25.05882459314221]
We show that as the number of training examples grow into the millions, the accuracy gap between finetuning BERT-based model and training vanilla LSTM from scratch narrows to within 1%.
Our findings indicate that pre-trained models might reach a diminishing return point as the supervised data size increases significantly.
arXiv Detail & Related papers (2020-06-15T18:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.