Fugu-MT 論文翻訳(概要): Generalized Knowledge Distillation via Relationship Matching

論文の概要: Generalized Knowledge Distillation via Relationship Matching

arxiv url: http://arxiv.org/abs/2205.01915v1
Date: Wed, 4 May 2022 06:49:47 GMT
ステータス: 翻訳完了
システム内更新日: 2022-05-05 14:09:25.736197
Title: Generalized Knowledge Distillation via Relationship Matching
Title（参考訳）: 関係マッチングによる一般知識蒸留
Authors: Han-Jia Ye, Su Lu, De-Chuan Zhan
Abstract要約: よく訓練されたディープニューラルネットワーク(いわゆる「教師」)の知識は、同様のタスクを学ぶのに有用である。知識蒸留は教師から知識を抽出し、対象モデルと統合する。教師に学生と同じ仕事をさせる代わりに、一般のラベル空間から訓練を受けた教師の知識を借りる。
参考スコア（独自算出の注目度）: 53.69235109551099
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks. Knowledge distillation extracts knowledge from the teacher and integrates it with the target model (a.k.a. the "student"), which expands the student's knowledge and improves its learning efficacy. Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space -- in this "Generalized Knowledge Distillation (GKD)", the classes of the teacher and the student may be the same, completely different, or partially overlapped. We claim that the comparison ability between instances acts as an essential factor threading knowledge across tasks, and propose the RElationship FacIlitated Local cLassifiEr Distillation (REFILLED) approach, which decouples the GKD flow of the embedding and the top-layer classifier. In particular, different from reconciling the instance-label confidence between models, REFILLED requires the teacher to reweight the hard tuples pushed forward by the student and then matches the similarity comparison levels between instances. An embedding-induced classifier based on the teacher model supervises the student's classification confidence and adaptively emphasizes the most related supervision from the teacher. REFILLED demonstrates strong discriminative ability when the classes of the teacher vary from the same to a fully non-overlapped set w.r.t. the student. It also achieves state-of-the-art performance on standard knowledge distillation, one-step incremental learning, and few-shot learning tasks.
Abstract（参考訳）: よく訓練されたディープニューラルネットワーク(すなわち「教師」)の知識は、同様のタスクを学ぶのに有用である。知識蒸留は教師から知識を抽出し、対象モデル(すなわち「学生」)と統合し、生徒の知識を拡大し、学習効果を向上させる。この「一般知識蒸留(Generalized Knowledge Distillation, GKD)」では、教師と生徒のクラスは同じ、全く異なる、あるいは部分的に重複しているかもしれない。我々は,各タスク間のスレッディングの知識として,インスタンス間の比較能力が重要であると主張し,組込みとトップ層分類器のGKDフローを分離するRelationship FacIlitated Local cLassifiEr Distillation (REFILLED)アプローチを提案する。特に、モデル間のインスタンス-ラベルの信頼性の調整とは違い、REFILLEDでは、教師は生徒が推進するハードタプルを再重み付けし、インスタンス間の類似性比較レベルと一致させる必要がある。教師モデルに基づく埋め込み型分類器は、生徒の分類信頼度を監督し、教師の最も関連する監督を適応的に強調する。 REFILLEDは、教師のクラスが同じから完全にオーバーラップされていないセットに変化するとき、生徒の強い差別能力を示す。また、標準的な知識蒸留、ワンステップインクリメンタルラーニング、わずかな学習タスクで最先端のパフォーマンスを実現している。

関連論文リスト

Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias [5.434571018755813]
グループ相対的知識蒸留(GRKD)は、クラス間の相対的なランク付けを学習することで教師の知識を蒸留する新しいフレームワークである。分類ベンチマークの実験では、GRKDは既存の手法よりも優れた一般化を実現している。
論文参考訳（メタデータ） (2025-04-29T07:23:22Z)
Relational Representation Distillation [6.24302896438145]
知識蒸留は、巨大で面倒な教師モデルからよりコンパクトな学生モデルへの知識の伝達を伴う。標準的アプローチは教師の内部表現における重要な構造的関係を捉えるのに失敗する。近年の進歩は対照的な学習目標に変わったが、これらの手法はインスタンス識別を通じて過度に厳格な制約を課している。本手法では,教師と生徒の分布に異なる温度パラメータを用い,よりシャープな学生出力を用いて,二次的類似性を保ちながら一次関係の正確な学習を可能にする。
論文参考訳（メタデータ） (2024-07-16T14:56:13Z)
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation [70.92135839545314]
本研究では,教師の持つ特徴の一部を,特徴蒸留前の先行知識として統合した動的事前知識(DPK)を提案する。 DPKは,教員モデルと生徒モデルのパフォーマンスを正に相関させ,より大きな教員を適用することで生徒の精度をさらに高めることができる。
論文参考訳（メタデータ） (2022-06-13T11:52:13Z)
Knowledge Distillation from A Stronger Teacher [44.11781464210916]
本稿では,より強い教師を駆使したDIST法を提案する。経験的に、学生と教師の予測の相違は、かなり厳しいものになりがちである。提案手法は単純かつ実用的であり,様々なアーキテクチャに適応できることを示す。
論文参考訳（メタデータ） (2022-05-21T08:30:58Z)
Does Knowledge Distillation Really Work? [106.38447017262183]
知識蒸留は学生の一般化を改善することができるが、一般的に理解されているようには機能しない。学生が教師に合わない理由として,最適化の難しさがあげられる。
論文参考訳（メタデータ） (2021-06-10T17:44:02Z)
Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
本稿では,教師の知識を学生とより整合させる,新たな学生依存型蒸留法である知識一貫型蒸留を提案する。この手法は非常に柔軟で,他の最先端手法と容易に組み合わせることができる。
論文参考訳（メタデータ） (2021-03-31T06:52:20Z)
Distilling Knowledge via Intermediate Classifier Heads [0.5584060970507505]
知識蒸留は、事前訓練されたより大きな教師モデルのガイドを用いて、リソース限定の学生モデルを訓練するためのトランスファーラーニングアプローチである。キャパシティギャップの影響を軽減するため,中間頭部による知識蒸留を導入する。種々の教師と学生のペアとデータセットに関する実験により,提案手法が標準知識蒸留法よりも優れていることを示した。
論文参考訳（メタデータ） (2021-02-28T12:52:52Z)
Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation [22.14228918338769]
本稿では,一般知識の学習がコンセンサスに到達するアイデアとより一致した,新たなトレーニングフレームワークを提案する。トレーニング効率を犠牲にすることなく、モデル一般化を効果的に改善します。
論文参考訳（メタデータ） (2021-02-22T05:23:34Z)
Multi-level Knowledge Distillation [13.71183256776644]
教師から学生ネットワークへより豊かな表現的知識を伝達するために,MLKD(Multi-level Knowledge Distillation)を導入する。 MLKDは、個人類似性、関係類似性、カテゴリー類似性という3つの新しい教師-学生類似性を採用している。実験により、MLKDは同様のアーキテクチャタスクとクロスアーキテクチャタスクの両方において、他の最先端メソッドよりも優れていることが示された。
論文参考訳（メタデータ） (2020-12-01T15:27:15Z)
Dual Policy Distillation [58.43610940026261]
教員政策を学生政策に転換する政策蒸留は、深層強化学習の課題において大きな成功を収めた。本研究では,2人の学習者が同じ環境下で活動し,環境の異なる視点を探索する,学生学生による二重政策蒸留(DPD)を導入する。この二重学習フレームワークを開発する上で重要な課題は、同時代の学習に基づく強化学習アルゴリズムにおいて、ピア学習者から有益な知識を特定することである。
論文参考訳（メタデータ） (2020-06-07T06:49:47Z)
Role-Wise Data Augmentation for Knowledge Distillation [48.115719640111394]
知識蒸留(KD)は、ある機械学習モデルから学んだ知識を別の機械学習モデルに転送する一般的な方法である。我々は、知識蒸留を促進するために、異なる役割を持つデータ増強剤を設計する。特別に調整されたデータポイントが、教師の知識をより効果的に生徒に示せることを実証的に見出した。
論文参考訳（メタデータ） (2020-04-19T14:22:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。