On Training Instance Selection for Few-Shot Neural Text Generation
- URL: http://arxiv.org/abs/2107.03176v1
- Date: Wed, 7 Jul 2021 12:16:16 GMT
- Title: On Training Instance Selection for Few-Shot Neural Text Generation
- Authors: Ernie Chang, Xiaoyu Shen, Hui-Syuan Yeh, Vera Demberg
- Abstract summary: We present a study on training instance selection in few-shot neural text generation.
We propose a simple selection strategy with K-means clustering.
We show that the generation models consistently outperform random sampling on three text generation tasks.
- Score: 9.37935464602938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale pretrained language models have led to dramatic improvements in
text generation. Impressive performance can be achieved by finetuning only on a
small number of instances (few-shot setting). Nonetheless, almost all previous
work simply applies random sampling to select the few-shot training instances.
Little to no attention has been paid to the selection strategies and how they
would affect model performance. In this work, we present a study on training
instance selection in few-shot neural text generation. The selection decision
is made based only on the unlabeled data so as to identify the most worthwhile
data points that should be annotated under some budget of labeling cost. Based
on the intuition that the few-shot training instances should be diverse and
representative of the entire data distribution, we propose a simple selection
strategy with K-means clustering. We show that even with the naive
clustering-based approach, the generation models consistently outperform random
sampling on three text generation tasks: data-to-text generation, document
summarization and question generation. We hope that this work will call for
more attention on this largely unexplored area.
Related papers
- Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Target-Aware Language Modeling via Granular Data Sampling [25.957424920194914]
Language model pretraining generally targets a broad range of use cases and incorporates data from diverse sources.
A cost-effective and straightforward approach is sampling with low-dimensional data features.
We show that pretrained models perform on par with the full RefinedWeb data and outperform randomly selected samples for model sizes ranging from 125M to 1.5B.
arXiv Detail & Related papers (2024-09-23T04:52:17Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Unsupervised Calibration through Prior Adaptation for Text
Classification using Large Language Models [37.39843935632105]
We propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples.
Results show that these methods outperform the un-adapted model for different number of training shots in the prompt.
arXiv Detail & Related papers (2023-07-13T12:11:36Z) - Meta-learning Pathologies from Radiology Reports using Variance Aware
Prototypical Networks [3.464871689508835]
We propose a simple extension of the Prototypical Networks for few-shot text classification.
Our main idea is to replace the class prototypes by Gaussians and introduce a regularization term that encourages the examples to be clustered near the appropriate class centroids.
arXiv Detail & Related papers (2022-10-22T05:22:29Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples.
A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels.
We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - The SelectGen Challenge: Finding the Best Training Samples for Few-Shot
Neural Text Generation [11.534198637625208]
We propose a shared task on training instance selection for few-shot neural text generation.
The study of the selection strategy can help us to (1) make the most use of our annotation budget in downstream tasks and (2) better benchmark few-shot text generative models.
arXiv Detail & Related papers (2021-08-14T21:20:35Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.