STraTA: Self-Training with Task Augmentation for Better Few-shot
Learning
- URL: http://arxiv.org/abs/2109.06270v1
- Date: Mon, 13 Sep 2021 19:14:01 GMT
- Title: STraTA: Self-Training with Task Augmentation for Better Few-shot
Learning
- Authors: Tu Vu, Minh-Thang Luong, Quoc V. Le, Grady Simon, Mohit Iyyer
- Abstract summary: We propose STraTA, which stands for Self-Training with Task Augmentation.
Our experiments demonstrate that STraTA can substantially improve sample efficiency across 12 few-shot benchmarks.
Our analyses reveal that task augmentation and self-training are both complementary and independently effective.
- Score: 77.04780470527432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their recent successes in tackling many NLP tasks, large-scale
pre-trained language models do not perform as well in few-shot settings where
only a handful of training examples are available. To address this shortcoming,
we propose STraTA, which stands for Self-Training with Task Augmentation, an
approach that builds on two key ideas for effective leverage of unlabeled data.
First, STraTA uses task augmentation, a novel technique that synthesizes a
large amount of data for auxiliary-task fine-tuning from target-task unlabeled
texts. Second, STraTA performs self-training by further fine-tuning the strong
base model created by task augmentation on a broad distribution of
pseudo-labeled data. Our experiments demonstrate that STraTA can substantially
improve sample efficiency across 12 few-shot benchmarks. Remarkably, on the
SST-2 sentiment dataset, STraTA, with only 8 training examples per class,
achieves comparable results to standard fine-tuning with 67K training examples.
Our analyses reveal that task augmentation and self-training are both
complementary and independently effective.
Related papers
- Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - SEPT: Towards Scalable and Efficient Visual Pre-Training [11.345844145289524]
Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance.
We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
arXiv Detail & Related papers (2022-12-11T11:02:11Z) - Effective Vision Transformer Training: A Data-Centric Perspective [24.02488085447691]
Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs)
In this paper, we define several metrics, including Dynamic Data Proportion (DDP) and Knowledge Assimilation Rate (KAR)
We propose a novel data-centric ViT training framework to dynamically measure the difficulty'' of training samples and generate effective'' samples for models at different training stages.
arXiv Detail & Related papers (2022-09-29T17:59:46Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - LiST: Lite Self-training Makes Efficient Few-shot Learners [91.28065455714018]
LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
arXiv Detail & Related papers (2021-10-12T18:47:18Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z) - Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning
in NLP Using Fewer Parameters & Less Data [5.689320790746046]
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks.
However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer.
We propose a novel Transformer architecture consisting of a new conditional attention mechanism and a set of task-conditioned modules.
arXiv Detail & Related papers (2020-09-19T02:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.