Fugu-MT 論文翻訳(概要): Towards Understanding Knowledge Distillation

論文の概要: Towards Understanding Knowledge Distillation

arxiv url: http://arxiv.org/abs/2105.13093v1
Date: Thu, 27 May 2021 12:45:08 GMT
ステータス: 翻訳完了
システム内更新日: 2021-05-29 01:28:08.847181
Title: Towards Understanding Knowledge Distillation
Title（参考訳）: 知識蒸留の理解に向けて
Authors: Mary Phuong, Christoph H. Lampert
Abstract要約: 知識蒸留は、分類器間の知識伝達において、経験的に非常に成功した技術である。この現象の十分な理論的説明はない。本稿では, 線形および深部線形分類器の特別事例を考察することにより, 蒸留の作業機構に関する最初の知見を提供する。
参考スコア（独自算出の注目度）: 37.71779364624616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge distillation, i.e., one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. So far, however, there is no satisfactory theoretical explanation of this phenomenon. In this work, we provide the first insights into the working mechanisms of distillation by studying the special case of linear and deep linear classifiers. Specifically, we prove a generalization bound that establishes fast convergence of the expected risk of a distillation-trained linear classifier. From the bound and its proof we extract three key factors that determine the success of distillation: * data geometry -- geometric properties of the data distribution, in particular class separation, has a direct influence on the convergence speed of the risk; * optimization bias -- gradient descent optimization finds a very favorable minimum of the distillation objective; and * strong monotonicity -- the expected risk of the student classifier always decreases when the size of the training set grows.
Abstract（参考訳）: 知識蒸留(英: knowledge distillation)とは、ある分類器が他の分類器の出力で訓練され、経験的に非常に成功した技術である。分類器は他の分類器の出力をソフトラベルとしてトレーニングした場合、基底的真理データではなく、より高速かつ確実に学習することが観察されている。しかし、今のところこの現象の理論的説明は十分ではない。本研究では, 線形および深部線形分類器の特別な場合を研究することにより, 蒸留の作業機構に関する最初の知見を提供する。具体的には,蒸留訓練線形分類器の期待リスクを高速に収束させる一般化境界を証明した。 From the bound and its proof we extract three key factors that determine the success of distillation: * data geometry -- geometric properties of the data distribution, in particular class separation, has a direct influence on the convergence speed of the risk; * optimization bias -- gradient descent optimization finds a very favorable minimum of the distillation objective; and * strong monotonicity -the expected risk of the student classifier always decreases when the size of the training set grows.

関連論文リスト

Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective [100.54185280153753]
分類器なし誘導と分類器なし誘導の両方が,微分拡散軌道を決定境界から遠ざけることによって条件付き生成を実現することがわかった。本研究では,フローマッチングをベースとした汎用的な後処理ステップを提案し,事前学習した復調拡散モデルに対する学習分布と実データ分布とのギャップを小さくする。
論文参考訳（メタデータ） (2025-03-13T17:59:59Z)
Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels [10.696635172502141]
マルチラウンド自己蒸留は,高い特徴相関を持つインスタンス間のラベル平均化を効果的に行う。教師の最上位2つのソフトマックス出力の精巧な部分ラベルを用いた,新規で効率的なシングルラウンド自己蒸留法を提案する。
論文参考訳（メタデータ） (2024-02-16T07:13:12Z)
The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
ディープニューラルネットワークの現実的な応用は、ノイズの多い入力や敵攻撃に直面した場合、その不安定な予測によって妨げられる。入力にノイズ注入を頼りに、認証された半径を持つ効率的な分類器を設計する方法を示す。新たな認証手法により、ランダムな平滑化による事前学習モデルの使用が可能となり、ゼロショット方式で現在の認証半径を効果的に改善できる。
論文参考訳（メタデータ） (2023-09-28T22:41:47Z)
Conditional Generative Data-Free Knowledge Distillation based on Attention Transfer [0.8594140167290099]
実データを必要としない効率的な携帯ネットワークを訓練するための条件付き生成データフリー知識蒸留(CGDD)フレームワークを提案する。本フレームワークでは,教師モデルから抽出した知識を除き,事前設定ラベルを付加的な補助情報として導入する。 CIFAR10, CIFAR100, Caltech101では, 99.63%, 99.07%, 99.84%の相対精度が得られた。
論文参考訳（メタデータ） (2021-12-31T09:23:40Z)
Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
分類器がノイズの多いトレーニングデータを記憶しながらも、優れた一般化性能を達成している「双曲オーバーフィッティング」は、機械学習コミュニティにおいて大きな注目を集めている。本研究は, 対人訓練において, 対人訓練において, 良心過剰が実際に発生することを示し, 対人訓練に対する防御の原則的アプローチを示す。
論文参考訳（メタデータ） (2021-12-31T00:27:31Z)
Response-based Distillation for Incremental Object Detection [2.337183337110597]
従来の物体検出は漸進的な学習には不適当である。新しいデータのみを用いて、よく訓練された検出モデルを直接微調整することで、破滅的な忘れを招きます。本研究では,検出境界ボックスからの学習応答と分類予測に着目した完全応答に基づくインクリメンタル蒸留法を提案する。
論文参考訳（メタデータ） (2021-10-26T08:07:55Z)
RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
ラベルのないデータを利用して一般化境界を生成する手法を紹介します。境界が0-1経験的リスク最小化に有効であることを証明します。この作業は、見えないラベル付きデータが利用できない場合でも、ディープネットの一般化を証明するためのオプションを実践者に提供します。
論文参考訳（メタデータ） (2021-05-01T17:05:29Z)
Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation [54.49894381464853]
本稿では, ラベル付きデータとラベルなしデータの両方を, 知識蒸留による精度向上に活用することを提案する。摂動に敏感なサンプルマイニングを用いたマスク誘導型平均教師フレームワークを提案する。実験の結果,ラベル付きデータのみから学習した教師付き手法と比較して,提案手法は性能を著しく向上することがわかった。
論文参考訳（メタデータ） (2020-07-21T13:27:09Z)
Regularizing Class-wise Predictions via Self-knowledge Distillation [80.76254453115766]
類似サンプル間の予測分布を解析する新しい正規化法を提案する。これにより、単一のネットワークの暗黒知識(すなわち誤った予測に関する知識)を規則化する。画像分類タスクにおける実験結果から, 単純だが強力な手法が一般化能力を大幅に向上することを示した。
論文参考訳（メタデータ） (2020-03-31T06:03:51Z)
On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime [18.788429230344214]
極めて広い2層非線形ネットワークの設定における知識蒸留(KD)の理論的解析を行った。学生ネットワークの学習内容と,学生ネットワークの収束率を実証する。また,このモデルで抽選券仮説(Frankle & Carbin)を検証した。
論文参考訳（メタデータ） (2020-03-30T13:03:28Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。