Self-training Improves Pre-training for Natural Language Understanding
- URL: http://arxiv.org/abs/2010.02194v1
- Date: Mon, 5 Oct 2020 17:52:25 GMT
- Title: Self-training Improves Pre-training for Natural Language Understanding
- Authors: Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur
Celebi, Michael Auli, Ves Stoyanov, Alexis Conneau
- Abstract summary: We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
- Score: 63.78927366363178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised pre-training has led to much recent progress in natural language
understanding. In this paper, we study self-training as another way to leverage
unlabeled data through semi-supervised learning. To obtain additional data for
a specific task, we introduce SentAugment, a data augmentation method which
computes task-specific query embeddings from labeled data to retrieve sentences
from a bank of billions of unlabeled sentences crawled from the web. Unlike
previous semi-supervised methods, our approach does not require in-domain
unlabeled data and is therefore more generally applicable. Experiments show
that self-training is complementary to strong RoBERTa baselines on a variety of
tasks. Our augmentation approach leads to scalable and effective self-training
with improvements of up to 2.6% on standard text classification benchmarks.
Finally, we also show strong gains on knowledge-distillation and few-shot
learning.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training [20.98770732015944]
Few-shot intent detection involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data.
We show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected.
To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance.
arXiv Detail & Related papers (2023-06-08T15:26:52Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Investigating a Baseline Of Self Supervised Learning Towards Reducing
Labeling Costs For Image Classification [0.0]
The study implements the kaggle.com' cats-vs-dogs dataset, Mnist and Fashion-Mnist to investigate the self-supervised learning task.
Results show that the pretext process in the self-supervised learning improves the accuracy around 15% in the downstream classification task.
arXiv Detail & Related papers (2021-08-17T06:43:05Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Data-Efficient Pretraining via Contrastive Self-Supervision [48.255310614527694]
In this work, we evaluate against three core challenges for resource efficient learning.
We propose a data and compute efficient self-supervised, contrastive text encoder, pretrained on 60MB of task-internal' text data.
We find our method outperforms RoBERTa, while pretraining and fine-tuning in a 1/5th of RoBERTa's fine-tuning time.
arXiv Detail & Related papers (2020-10-02T15:41:57Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.