Fugu-MT 論文翻訳(概要): Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

論文の概要: Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

arxiv url: http://arxiv.org/abs/2012.09816v1
Date: Thu, 17 Dec 2020 18:34:45 GMT
ステータス: 翻訳完了
システム内更新日: 2021-05-02 07:34:34.484266
Title: Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Title（参考訳）: 深層学習におけるアンサンブル,知識蒸留,自己蒸留の理解に向けて
Authors: Zeyuan Allen-Zhu and Yuanzhi Li
Abstract要約: 本研究では,学習モデルのアンサンブルがテスト精度を向上させる方法と,アンサンブルの優れた性能を単一モデルに蒸留する方法について検討する。深層学習におけるアンサンブル/知識蒸留は,従来の学習理論とは大きく異なる。また, 自己蒸留は, アンサンブルと知識蒸留を暗黙的に組み合わせて, 試験精度を向上させることができることを示した。
参考スコア（独自算出の注目度）: 93.18238573921629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We formally study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using Knowledge Distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the SAME architecture, trained using the SAME algorithm on the SAME data set, and they only differ by the random seeds used in the initialization. We empirically show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory, especially differently from ensemble of random feature mappings or the neural-tangent-kernel feature mappings, and is potentially out of the scope of existing theorems. Thus, to properly understand ensemble and knowledge distillation in deep learning, we develop a theory showing that when data has a structure we refer to as "multi-view", then ensemble of independently trained neural networks can provably improve test accuracy, and such superior test accuracy can also be provably distilled into a single model by training a single model to match the output of the ensemble instead of the true label. Our result sheds light on how ensemble works in deep learning in a way that is completely different from traditional theorems, and how the "dark knowledge" is hidden in the outputs of the ensemble -- that can be used in knowledge distillation -- comparing to the true data labels. In the end, we prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.
Abstract（参考訳）: 深層学習モデルのアンサンブルがテスト精度を向上させる方法と、知識蒸留を用いた単一モデルにアンサンブルの優れた性能を蒸留する方法を正式に研究する。我々は,このアンサンブルが,一意に訓練された数個のニューラルネットワークのパットアーキテクチャによる出力の平均であり,パットデータセット上で,パットアルゴリズムを用いてトレーニングされている場合,初期化に使用するランダムなシードによってのみ異なる場合を考える。深層学習におけるアンサンブル・ナレッジ蒸留は従来の学習理論とは全く異なる働きをしており、特にランダム特徴マッピングやニューラルネットワーク-タンジェント-カーネル特徴マッピングとは異なっている。そこで, 深層学習におけるアンサンブルと知識蒸留を適切に理解するために, データが「マルチビュー」と呼ばれる構造を持つ場合, 独立に訓練されたニューラルネットワークのアンサンブルがテスト精度を向上し, 真のラベルの代わりにアンサンブルの出力に適合するように単一のモデルを訓練することにより, 優れたテスト精度を1つのモデルに証明可能とする理論を開発した。その結果、従来の定理とは全く異なる方法で、アンサンブルがディープラーニングでどのように機能するか、そして、真のデータラベルと比較して、知識蒸留に使用できるアンサンブルのアウトプットに「ダーク知識」がどのように隠されているかに光を当てている。最後に, 自己蒸留は, アンサンブルと知識蒸留を暗黙的に組み合わせて, 試験精度を向上させることができることを示した。

関連論文リスト

Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model [5.624791703748109]
不確かさの定量化は強化学習と深層学習の重要な側面である。本研究では,1つのモデルによる深層ニューラルネットワークのアンサンブルの分散を明示的に推定する新しい手法である文脈類似蒸留を提案する。提案手法は,様々なアウト・オブ・ディストリビューション検出ベンチマークとスパース・リワード強化学習環境にまたがって実証的に検証する。
論文参考訳（メタデータ） (2025-03-14T12:09:58Z)
LAKD-Activation Mapping Distillation Based on Local Learning [12.230042188890838]
本稿では,新しい知識蒸留フレームワークであるローカル注意知識蒸留(LAKD)を提案する。 LAKDは、教師ネットワークからの蒸留情報をより効率的に利用し、高い解釈性と競争性能を実現する。 CIFAR-10, CIFAR-100, ImageNetのデータセットについて実験を行い, LAKD法が既存手法より有意に優れていたことを示す。
論文参考訳（メタデータ） (2024-08-21T09:43:27Z)
Towards a theory of model distillation [0.0]
蒸留は、複雑な機械学習モデルを、オリジナルを近似するより単純なモデルに置き換える作業である。ニューラルネットワークを簡潔で明確な決定木表現に効率的に抽出する方法を示す。我々は, 蒸留がスクラッチから学習するよりもはるかに安価であることを証明するとともに, その複雑さを特徴づけることを進める。
論文参考訳（メタデータ） (2024-03-14T02:42:19Z)
Learning Discretized Bayesian Networks with GOMEA [0.0]
我々は、可変離散化を共同学習するために、既存の最先端構造学習アプローチを拡張した。これにより、専門家の知識をユニークな洞察に富んだ方法で組み込むことができ、複雑性、正確性、および事前に決定された専門家ネットワークとの差異をトレードオフする複数のDBNを見つけることができることを示す。
論文参考訳（メタデータ） (2024-02-19T14:29:35Z)
Distribution Shift Matters for Knowledge Distillation with Webly Collected Images [91.66661969598755]
異なる分布間の知識蒸留という新しい手法を提案する(KD$3$)。まず,教師ネットワークと学生ネットワークの併用予測に基づいて,Webで収集したデータから有用なトレーニングインスタンスを動的に選択する。また、MixDistributionと呼ばれる新しいコントラスト学習ブロックを構築して、新しい分散のインスタンスアライメントで摂動データを生成します。
論文参考訳（メタデータ） (2023-07-21T10:08:58Z)
Self-Knowledge Distillation for Surgical Phase Recognition [8.708027525926193]
本稿では,現在最先端(SOTA)モデルに統合可能な自己知識蒸留フレームワークを提案する。我々のフレームワークは4つの一般的なSOTAアプローチの上に埋め込まれており、そのパフォーマンスを継続的に改善しています。
論文参考訳（メタデータ） (2023-06-15T08:55:00Z)
Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation [68.13453771001522]
画像の領域と対応するセマンティック埋め込みとをマッチングする多モード集中型ZSLフレームワークを提案する。我々は、大規模な実世界のデータに基づいて、広範囲な実験を行い、そのモデルを評価する。
論文参考訳（メタデータ） (2023-06-14T13:07:48Z)
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation [130.08432609780374]
3D行動認識では、骨格のモダリティの間に豊富な相補的な情報が存在する。本稿では,CMD(Cross-modal Mutual Distillation)フレームワークを提案する。提案手法は,既存の自己管理手法より優れ,新しい記録を多数設定する。
論文参考訳（メタデータ） (2022-08-26T06:06:09Z)
Distilling Holistic Knowledge with Graph Neural Networks [37.86539695906857]
知識蒸留(KD)は、より大規模な教師ネットワークからより小さな学習可能な学生ネットワークへ知識を伝達することを目的としている。既存のKD法は主に、個々の知識と関係知識の2つの種類の知識を考察してきた。本稿では, インスタンス間に構築された属性グラフに基づいて, 新たな包括的知識を蒸留する。
論文参考訳（メタデータ） (2021-08-12T02:47:59Z)
Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification [57.5041270212206]
本稿では,アンカー画像のためのソフトターゲットを生成するために,BAtch Knowledge Ensembling (BAKE)を提案する。 BAKEは、1つのネットワークだけで複数のサンプルを網羅するオンライン知識を実現する。既存の知識集合法と比較して計算とメモリのオーバーヘッドは最小限である。
論文参考訳（メタデータ） (2021-04-27T16:11:45Z)
Towards a Universal Continuous Knowledge Base [49.95342223987143]
複数のニューラルネットワークからインポートされた知識を格納できる継続的知識基盤を構築する方法を提案する。テキスト分類実験は有望な結果を示す。我々は複数のモデルから知識ベースに知識をインポートし、そこから融合した知識を単一のモデルにエクスポートする。
論文参考訳（メタデータ） (2020-12-25T12:27:44Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。