Fugu-MT 論文翻訳(概要): Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation

論文の概要: Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation

arxiv url: http://arxiv.org/abs/2106.06168v1
Date: Fri, 11 Jun 2021 05:01:24 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-14 14:11:20.979395
Title: Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation
Title（参考訳）: 生成・注釈・学習:生成モデルによる自己学習・知識蒸留の促進
Authors: Xuanli He, Islam Nassar, Jamie Kiros, Gholamreza Haffari, Mohammad Norouzi
Abstract要約: Semi-Supervised Learning (SSL)は多くのアプリケーションドメインで成功している。知識蒸留(KD)により、深層ネットワークとアンサンブルの圧縮が可能となり、新しいタスク固有の未ラベルの例について知識を蒸留する際に最良の結果が得られる。我々は、非条件生成モデルを用いて、ドメイン内の未ラベルデータを合成する「生成、注釈、学習(GAL)」と呼ばれる一般的なフレームワークを提案する。
参考スコア（独自算出の注目度）: 58.64720318755764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semi-Supervised Learning (SSL) has seen success in many application domains, but this success often hinges on the availability of task-specific unlabeled data. Knowledge distillation (KD) has enabled compressing deep networks and ensembles, achieving the best results when distilling knowledge on fresh task-specific unlabeled examples. However, task-specific unlabeled data can be challenging to find. We present a general framework called "generate, annotate, and learn (GAL)" that uses unconditional generative models to synthesize in-domain unlabeled data, helping advance SSL and KD on different tasks. To obtain strong task-specific generative models, we adopt generic generative models, pretrained on open-domain data, and fine-tune them on inputs from specific tasks. Then, we use existing classifiers to annotate generated unlabeled examples with soft pseudo labels, which are used for additional training. When self-training is combined with samples generated from GPT2-large, fine-tuned on the inputs of each GLUE task, we outperform a strong RoBERTa-large baseline on the GLUE benchmark. Moreover, KD on GPT-2 samples yields a new state-of-the-art for 6-layer transformers on the GLUE leaderboard. Finally, self-training with GAL offers significant gains on image classification on CIFAR-10 and four tabular tasks from the UCI repository
Abstract（参考訳）: Semi-Supervised Learning (SSL)は多くのアプリケーションドメインで成功している。知識蒸留(kd)は深層ネットワークとアンサンブルの圧縮を可能にし、新しいタスク固有のラベルなしの例で知識を蒸留する場合の最良の結果を得る。しかし、タスク固有の未ラベルデータを見つけるのは難しい。我々は,無条件生成モデルを用いて非ラベルデータを合成し,ssl と kd を異なるタスクで前進させる "generate, annotate, learn (gal)" という汎用フレームワークを提案する。タスク固有の生成モデルを得るために、オープンドメインデータに基づいて事前訓練された汎用生成モデルを採用し、特定のタスクからの入力を微調整する。次に,既存の分類器を用いて,生成した未ラベルのサンプルにソフトな擬似ラベルをアノテートする。各GLUEタスクの入力を微調整した GPT2-large から生成されたサンプルと自己学習を組み合わせた場合、GLUEベンチマーク上で強力な RoBERTa-large ベースラインを上回ります。さらに、GPT-2サンプル上のKDはGLUEリーダーボード上の6層トランスのための新しい最先端技術をもたらす。最後に、GALによる自己学習は、CIFAR-10の画像分類とUCIレポジトリからの4つの表型タスクに大きく貢献する

論文の概要: Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation

関連論文リスト