ZeroGen: Efficient Zero-shot Learning via Dataset Generation
- URL: http://arxiv.org/abs/2202.07922v1
- Date: Wed, 16 Feb 2022 08:18:02 GMT
- Title: ZeroGen: Efficient Zero-shot Learning via Dataset Generation
- Authors: Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong
Wu, Tao Yu, Lingpeng Kong
- Abstract summary: We study a flexible and efficient zero-short learning method, ZeroGen.
Given a zero-shot task, we first generate a dataset from scratch using PLMs in an unsupervised manner.
Experiments and analysis on different NLP tasks, namely, text classification, question answering, and natural language inference, show the effectiveness of ZeroGen.
- Score: 28.454620513642034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a growing interest in dataset generation recently due to the
superior generative capacity of large pre-trained language models (PLMs). In
this paper, we study a flexible and efficient zero-short learning method,
ZeroGen. Given a zero-shot task, we first generate a dataset from scratch using
PLMs in an unsupervised manner. Then, we train a tiny task model (e.g., LSTM)
under the supervision of the synthesized dataset. This approach allows highly
efficient inference as the final task model only has orders of magnitude fewer
parameters comparing to PLMs (e.g., GPT2-XL). Apart from being annotation-free
and efficient, we argue that ZeroGen can also provide useful insights from the
perspective of data-free model-agnostic knowledge distillation, and
unreferenced text generation evaluation. Experiments and analysis on different
NLP tasks, namely, text classification, question answering, and natural
language inference), show the effectiveness of ZeroGen.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - ReGen: Zero-Shot Text Classification via Training Data Generation with
Progressive Dense Retrieval [22.882301169283323]
We propose a retrieval-enhanced framework to create training data from a general-domain unlabeled corpus.
Experiments on nine datasets demonstrate that REGEN achieves 4.3% gain over the strongest baselines and saves around 70% of the time compared to baselines using large NLG models.
arXiv Detail & Related papers (2023-05-18T04:30:09Z) - Beyond prompting: Making Pre-trained Language Models Better Zero-shot
Learners by Clustering Representations [24.3378487252621]
We show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of pre-trained language models.
Our approach achieves an average of 20% absolute improvement over prompt-based zero-shot learning.
arXiv Detail & Related papers (2022-10-29T16:01:51Z) - ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback [21.168991554983815]
We propose a progressive zero-shot dataset generation framework, ProGen, to guide the generation of new training data.
We show ProGen achieves on-par or superior performance with only 1% synthetic dataset size.
arXiv Detail & Related papers (2022-10-22T02:07:10Z) - ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient
Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation.
We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z) - ZeroBERTo -- Leveraging Zero-Shot Text Classification by Topic Modeling [57.80052276304937]
This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task.
We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
arXiv Detail & Related papers (2022-01-04T20:08:17Z) - Towards Zero-Label Language Learning [20.28186484098947]
This paper explores zero-label learning in Natural Language Processing (NLP)
No human-annotated data is used anywhere during training and models are trained purely on synthetic data.
Inspired by the recent success of few-shot inference on GPT-3, we present a training data creation procedure named Unsupervised Data Generation.
arXiv Detail & Related papers (2021-09-19T19:00:07Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.