Fugu-MT 論文翻訳(概要): Role-Wise Data Augmentation for Knowledge Distillation

論文の概要: Role-Wise Data Augmentation for Knowledge Distillation

arxiv url: http://arxiv.org/abs/2004.08861v1
Date: Sun, 19 Apr 2020 14:22:17 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-11 23:48:03.412816
Title: Role-Wise Data Augmentation for Knowledge Distillation
Title（参考訳）: 知識蒸留におけるロールワイズデータ拡張
Authors: Jie Fu, Xue Geng, Zhijian Duan, Bohan Zhuang, Xingdi Yuan, Adam Trischler, Jie Lin, Chris Pal, Hao Dong
Abstract要約: 知識蒸留(KD)は、ある機械学習モデルから学んだ知識を別の機械学習モデルに転送する一般的な方法である。我々は、知識蒸留を促進するために、異なる役割を持つデータ増強剤を設計する。特別に調整されたデータポイントが、教師の知識をより効果的に生徒に示せることを実証的に見出した。
参考スコア（独自算出の注目度）: 48.115719640111394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated. Due to the difference in model capacities, the student may not benefit fully from the same data points on which the teacher is trained. On the other hand, a human teacher may demonstrate a piece of knowledge with individualized examples adapted to a particular student, for instance, in terms of her cultural background and interests. Inspired by this behavior, we design data augmentation agents with distinct roles to facilitate knowledge distillation. Our data augmentation agents generate distinct training data for the teacher and student, respectively. We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student. We compare our approach with existing KD methods on training popular neural architectures and demonstrate that role-wise data augmentation improves the effectiveness of KD over strong prior approaches. The code for reproducing our results can be found at https://github.com/bigaidream-projects/role-kd
Abstract（参考訳）: 知識蒸留(英: knowledge distillation, kd)とは、ある機械学習モデル(en: \textit{teacher})によって学習された「知識」を別のモデル(en: \textit{student})に移す一般的な方法である。私たちの知識では、既存の手法では、生徒が教師から余分な知識を吸収しているにもかかわらず、両方のモデルが同じ入力データを共有しているという事実を見落としています。モデル能力の違いにより、生徒は教師が訓練されるのと同じデータポイントから完全に利益を得ることができない。一方、人間教師は、例えば、その文化的背景と関心の観点から、特定の学生に適応した個別化された例で知識の一部を実証することができる。この挙動に触発されて,我々は知識蒸留を促進するために,異なる役割を持つデータ拡張エージェントを設計した。我々のデータ強化エージェントは,教師と生徒の個別のトレーニングデータを生成する。特別に調整されたデータポイントが、教師の知識をより効果的に生徒に示せることを実証的に見出した。我々は,本手法を既存のkd法と比較して,一般的なニューラルアーキテクチャを訓練し,役割回りのデータ拡張が強力な先行手法よりもkdの有効性を向上させることを実証する。結果はhttps://github.com/bigaidream-projects/role-kdで再生できます。

関連論文リスト

Improving Knowledge Distillation with Teacher's Explanation [14.935696904019146]
本稿では,KED(Knowledge Explaining Distillation)フレームワークを紹介する。 KEDは、教師の予測だけでなく、教師の説明からも学べるようにしている。様々なデータセットに対する実験により,KEDの学生はKDの学生と同じような複雑さを著しく上回る結果が得られた。
論文参考訳（メタデータ） (2023-10-04T04:18:01Z)
Improved knowledge distillation by utilizing backward pass knowledge in neural networks [17.437510399431606]
知識蒸留(KD)は、モデル圧縮において重要な技術の一つである。本研究では,教師の後方パスから知識を抽出し,新たな補助訓練サンプルを作成する。自然言語処理(NLP)と言語理解(Language understanding)の応用において,この手法がいかにうまく利用できるかを示す。
論文参考訳（メタデータ） (2023-01-27T22:07:38Z)
Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
知識蒸留フレームワークに「教えるべきものを教える」戦略を導入する。本稿では,より効率的かつ合理的な蒸留を支援するために,望まれる増補サンプルを探索するデータベース蒸留手法"TST"を提案する。具体的には,教師の強みと生徒の弱みを補うことを支援する,優先バイアス付きニューラルネットワークベースのデータ拡張モジュールを設計する。
論文参考訳（メタデータ） (2022-12-11T06:22:14Z)
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation [66.25738680429463]
物体検出のための知識蒸留(KD)は、教師モデルから知識を伝達することで、コンパクトな検出器を訓練することを目的としている。教師モデルの反直感的知覚に固有の知識を蒸留することを目的とした,一貫性のない知識蒸留(IKD)を提案する。本手法は, 1段, 2段, アンカーフリーの物体検出器において, 最先端のKDベースラインより優れる。
論文参考訳（メタデータ） (2022-09-20T16:36:28Z)
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation [70.92135839545314]
本研究では,教師の持つ特徴の一部を,特徴蒸留前の先行知識として統合した動的事前知識(DPK)を提案する。 DPKは,教員モデルと生徒モデルのパフォーマンスを正に相関させ,より大きな教員を適用することで生徒の精度をさらに高めることができる。
論文参考訳（メタデータ） (2022-06-13T11:52:13Z)
Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
よく訓練されたディープニューラルネットワーク(いわゆる「教師」)の知識は、同様のタスクを学ぶのに有用である。知識蒸留は教師から知識を抽出し、対象モデルと統合する。教師に学生と同じ仕事をさせる代わりに、一般のラベル空間から訓練を受けた教師の知識を借りる。
論文参考訳（メタデータ） (2022-05-04T06:49:47Z)
Revisiting Knowledge Distillation: An Inheritance and Exploration Framework [153.73692961660964]
知識蒸留(KD)は、教師モデルから生徒モデルに知識を伝達する一般的な手法である。新たな継承・探索知識蒸留フレームワーク(IE-KD)を提案する。我々のIE-KDフレームワークは汎用的であり、ディープニューラルネットワークを訓練するための既存の蒸留や相互学習手法と簡単に組み合わせることができる。
論文参考訳（メタデータ） (2021-07-01T02:20:56Z)
Undistillable: Making A Nasty Teacher That CANNOT teach students [84.6111281091602]
本論文では,ナスティ・ティーチング(Nasty Teacher)という,通常の教師とほぼ同じパフォーマンスを得られる特別に訓練されたティーチング・ネットワークについて紹介し,研究する。本稿では, 自負知識蒸留法という, シンプルで効果的な教師構築アルゴリズムを提案する。
論文参考訳（メタデータ） (2021-05-16T08:41:30Z)
Learning from a Lightweight Teacher for Efficient Knowledge Distillation [14.865673786025525]
本稿では,軽量な知識蒸留のためのLW-KDを提案する。まず、合成された単純なデータセット上に軽量な教師ネットワークをトレーニングし、ターゲットデータセットのそれと同等の調整可能なクラス番号を付ける。そして、教師はソフトターゲットを生成し、強化されたKD損失は、教師の出力と区別不能にするためのKD損失と敵対的損失の組合せである、学生の学習を誘導する。
論文参考訳（メタデータ） (2020-05-19T01:54:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。