Fugu-MT 論文翻訳(概要): On the Generalization of Knowledge Distillation: An Information-Theoretic View

論文の概要: On the Generalization of Knowledge Distillation: An Information-Theoretic View

arxiv url: http://arxiv.org/abs/2605.13143v2
Date: Fri, 15 May 2026 03:25:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 21:22:25.965551
Title: On the Generalization of Knowledge Distillation: An Information-Theoretic View
Title（参考訳）: 知識蒸留の一般化について:情報理論の視点から
Authors: Bingying Li, Haiyun He,
Abstract要約: 我々は,教師と学生の学習を複合的なプロセスとしてモデル化し,蒸留の分岐を導入する。本研究では,教師の局所的平坦度が厳密に拘束できることを示す。線形ガウスのケーススタディでは、蒸留の発散は、バイアス、分散、ランク・ボトルネックのコストへの解釈可能な分解を許容する。
参考スコア（独自算出の注目度）: 5.248154928825152
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation is widely used to improve generalization in practice, yet its theoretical understanding remains elusive. In the standard distillation setting, a teacher model provides soft predictions to guide the training of a student model. We model teacher and student training as coupled stochastic processes and introduce a distillation divergence, defined as the Kullback-Leibler divergence between these two stochastic kernels. Within this framework, we derive two generalization bounds for the student model relative to the teacher's generalization gap: an upper bound under a sub-Gaussian assumption via algorithmic stability, and a lower bound under a central condition with sharper dependence on the distillation divergence. We further develop a loss-sharpness-aware bound with an explicit tightness regime, showing that the teacher's local flatness can strictly tighten the bound. Additionally, in a linear Gaussian case study, the distillation divergence admits an interpretable decomposition into bias, variance, and rank-bottleneck costs, yielding practical guidance for distillation design.
Abstract（参考訳）: 知識蒸留は、実際は一般化を改善するために広く用いられているが、理論的な理解はいまだに解明されていない。標準的な蒸留環境では、教師モデルは学生モデルのトレーニングを指導するためのソフトな予測を提供する。我々は,教師と学生の学習を結合確率過程としてモデル化し,これらの2つの確率核間のKulback-Leibler分散として定義された蒸留分岐を導入する。本枠組みでは, 教師の一般化ギャップに対する生徒モデルに対する2つの一般化境界を導出する: アルゴリズム的安定性によるガウス的仮定の下での上界と, 蒸留の発散に強く依存した中央条件下での下界である。さらに,教師の局所的平坦度を厳密にすることができることを示す。さらに、線形ガウスのケーススタディでは、蒸留の発散は、バイアス、分散、ランク・ボトルネックコストへの解釈可能な分解を認め、蒸留設計の実践的なガイダンスを与える。

関連論文リスト

Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation [50.19746127327559]
教師モデルのトップK予測確率と低確率予測確率の寄与を分離する新しいテールアウェア分岐を提案する。実験により, 改良蒸留法は, デコーダモデルの事前学習と教師付き蒸留の両方において, 競争性能を発揮することが示された。
論文参考訳（メタデータ） (2026-02-24T11:54:06Z)
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model [72.73512218682187]
ReDiff(Refining-enhanced diffusion framework)は、モデルに自身のエラーを特定し、修正するように教えるフレームワークである。まず、合成エラーを修正するためにモデルをトレーニングすることで、基礎的なリビジョン機能を具現化し、次に、新しいオンライン自己補正ループを実装します。この誤り駆動学習は、モデルに既存の出力を再検討し、洗練する重要な能力を与え、エラーカスケードを効果的に破壊する。
論文参考訳（メタデータ） (2025-10-22T06:58:55Z)
Knowledge Distillation of Uncertainty using Deep Latent Factor Model [10.148306002388196]
ガウス蒸留と呼ばれる新しい流通蒸留法を導入する。これは、Dep Latent Factor Model (DLF)と呼ばれる特殊なガウス過程による教師のアンサンブルの分布を推定する。複数のベンチマークデータセットを用いて,提案したガウス蒸留が既存のベースラインより優れていることを示す。
論文参考訳（メタデータ） (2025-10-22T06:46:59Z)
Knowledge distillation through geometry-aware representational alignment [3.901188865224763]
既存の特徴蒸留法では, 損失ゼロであっても, 特徴構造を捕捉できないことを示す。次に、プロクリスト距離と特徴文法行列のフロベニウスノルムの使用を動機付け、すでに表現的アライメントの測定の文脈で一般的な距離である。本手法による特徴蒸留は,言語モデルファミリー間での蒸留性能の統計的に有意な改善を示すことを示す。
論文参考訳（メタデータ） (2025-09-27T09:59:46Z)
Knowledge Distillation Performs Partial Variance Reduction [93.6365393721122]
知識蒸留は'学生'モデルの性能を高めるための一般的な手法である。知識蒸留(KD)の背後にある力学は、まだ完全には理解されていない。我々は,KDを新しいタイプの分散還元機構として解釈できることを示す。
論文参考訳（メタデータ） (2023-05-27T21:25:55Z)
Supervision Complexity and its Role in Knowledge Distillation [65.07910515406209]
蒸留した学生の一般化行動について検討する。この枠組みは、教師の精度、教師の予測に対する生徒の差、教師の予測の複雑さの間の微妙な相互作用を強調している。オンライン蒸留の有効性を実証し,様々な画像分類ベンチマークとモデルアーキテクチャに関する理論的知見を検証した。
論文参考訳（メタデータ） (2023-01-28T16:34:47Z)
Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation [72.70058049274664]
文献要約のための新しい枠組みであるRefereeについて紹介する(つまり、監督のために金の要約を必要としない)。我々の研究は、シンボリック知識蒸留の概念的枠組みを通じて、参照不要で制御された文要約が実現可能であることを示す最初のものである。
論文参考訳（メタデータ） (2022-10-25T07:07:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。