Generate, Annotate, and Learn: Generative Models Advance Self-Training
and Knowledge Distillation
- URL: http://arxiv.org/abs/2106.06168v1
- Date: Fri, 11 Jun 2021 05:01:24 GMT
- Title: Generate, Annotate, and Learn: Generative Models Advance Self-Training
and Knowledge Distillation
- Authors: Xuanli He, Islam Nassar, Jamie Kiros, Gholamreza Haffari, Mohammad
Norouzi
- Abstract summary: Semi-Supervised Learning (SSL) has seen success in many application domains, but this success often hinges on the availability of task-specific unlabeled data.
Knowledge distillation (KD) has enabled compressing deep networks and ensembles, achieving the best results when distilling knowledge on fresh task-specific unlabeled examples.
We present a general framework called "generate, annotate, and learn (GAL)" that uses unconditional generative models to synthesize in-domain unlabeled data.
- Score: 58.64720318755764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-Supervised Learning (SSL) has seen success in many application domains,
but this success often hinges on the availability of task-specific unlabeled
data. Knowledge distillation (KD) has enabled compressing deep networks and
ensembles, achieving the best results when distilling knowledge on fresh
task-specific unlabeled examples. However, task-specific unlabeled data can be
challenging to find. We present a general framework called "generate, annotate,
and learn (GAL)" that uses unconditional generative models to synthesize
in-domain unlabeled data, helping advance SSL and KD on different tasks. To
obtain strong task-specific generative models, we adopt generic generative
models, pretrained on open-domain data, and fine-tune them on inputs from
specific tasks. Then, we use existing classifiers to annotate generated
unlabeled examples with soft pseudo labels, which are used for additional
training. When self-training is combined with samples generated from
GPT2-large, fine-tuned on the inputs of each GLUE task, we outperform a strong
RoBERTa-large baseline on the GLUE benchmark. Moreover, KD on GPT-2 samples
yields a new state-of-the-art for 6-layer transformers on the GLUE leaderboard.
Finally, self-training with GAL offers significant gains on image
classification on CIFAR-10 and four tabular tasks from the UCI repository
Related papers
- A Benchmark Generative Probabilistic Model for Weak Supervised Learning [2.0257616108612373]
Weak Supervised Learning approaches have been developed to alleviate the annotation burden.
We show that latent variable models (PLVMs) achieve state-of-the-art performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:06:24Z) - STUNT: Few-shot Tabular Learning with Self-generated Tasks from
Unlabeled Tables [64.0903766169603]
We propose a framework for few-shot semi-supervised learning, coined Self-generated Tasks from UNlabeled Tables (STUNT)
Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label.
We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks.
arXiv Detail & Related papers (2023-03-02T02:37:54Z) - Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes.
The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL)
We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z) - Boosting the Performance of Semi-Supervised Learning with Unsupervised
Clustering [10.033658645311188]
We show that ignoring labels altogether for whole epochs intermittently during training can significantly improve performance in the small sample regime.
We demonstrate our method's efficacy in boosting several state-of-the-art SSL algorithms.
arXiv Detail & Related papers (2020-12-01T14:19:14Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.