Fugu-MT 論文翻訳(概要): Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty

論文の概要: Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty

arxiv url: http://arxiv.org/abs/2602.12687v1
Date: Fri, 13 Feb 2026 07:43:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-16 23:37:53.882571
Title: Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty
Title（参考訳）: 不確実な教師を信頼する:校正された不確実性を通して暗黒知識を蒸留する
Authors: Jeonghyun Kim, SooKyung Kim, Richeng Xuan, Hyunsoo Cho,
Abstract要約: Calibrated Uncertainty Distillation (CUD)は、暗黒の知識をより忠実に利用できるようにするためのフレームワークである。我々のアプローチは精度と校正のバランスを保ち、生徒は自信のある信号とハードな信号に対する構造的不確実性の両方から恩恵を受けることができる。
参考スコア（独自算出の注目度）: 14.807774290798482
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The core of knowledge distillation lies in transferring the teacher's rich 'dark knowledge'-subtle probabilistic patterns that reveal how classes are related and the distribution of uncertainties. While this idea is well established, teachers trained with conventional cross-entropy often fail to preserve such signals. Their distributions collapse into sharp, overconfident peaks that appear decisive but are in fact brittle, offering little beyond the hard label or subtly hindering representation-level transfer. This overconfidence is especially problematic in high-cardinality tasks, where the nuances among many plausible classes matter most for guiding a compact student. Moreover, such brittle targets reduce robustness under distribution shift, leaving students vulnerable to miscalibration in real-world conditions. To address this limitation, we revisit distillation from a distributional perspective and propose Calibrated Uncertainty Distillation (CUD), a framework designed to make dark knowledge more faithfully accessible. Instead of uncritically adopting the teacher's overconfidence, CUD encourages teachers to reveal uncertainty where it is informative and guides students to learn from targets that are calibrated rather than sharpened certainty. By directly shaping the teacher's predictive distribution before transfer, our approach balances accuracy and calibration, allowing students to benefit from both confident signals on easy cases and structured uncertainty on hard ones. Across diverse benchmarks, CUD yields students that are not only more accurate, but also more calibrated under shift and more reliable on ambiguous, long-tail inputs.
Abstract（参考訳）: 知識蒸留の核心は、教師の豊かな「暗黒の知識」を伝達することにある。この考え方は確立されているが、従来のクロスエントロピーで訓練された教師はそのような信号を保存するのに失敗することが多い。分布は急激で自信過剰なピークに崩壊し、決定的に見えるが実際には脆く、硬いラベル以上のものを提供したり、表現レベルの移動をわずかに妨げたりするものはほとんどない。この過度な自信は、多くの有望なクラスにおけるニュアンスが、コンパクトな学生を導く上で最も重要である、高心力のタスクにおいて特に問題となる。さらに、そのような不安定なターゲットは、分布シフト時の堅牢性を低下させ、学生は現実世界の状況下での誤校正に弱いままである。この制限に対処するため、我々は、流通の観点から蒸留を再考し、暗黒知識をより忠実に利用できるようにするためのフレームワークであるキャリブレーション不確かさ蒸留(CUD)を提案する。教師の過度な自信を非批判的に取り入れる代わりに、CUDは教師に情報のある場所で不確実性を明らかにするように促し、生徒に確実性を高めるのではなく、校正対象から学ぶよう指導する。移動前に教師の予測分布を直接形成することにより、精度と校正のバランスを保ち、生徒は容易なケースに対する自信とハードケースに対する構造的不確実性の両方を享受できる。多様なベンチマークを通じて、CUDはより正確であるだけでなく、シフトの下で校正され、曖昧で長い尾の入力をより信頼性の高い学生を生み出す。

論文の概要: Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty

関連論文リスト