Fugu-MT 論文翻訳(概要): Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions

論文の概要: Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions

arxiv url: http://arxiv.org/abs/2606.12171v1
Date: Wed, 10 Jun 2026 14:59:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.515025
Title: Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions
Title（参考訳）: ダークナレッジを超えて: 信頼性予測のための混合ベースの蒸留
Authors: José Medina, Paul Honeine, Abdelaziz Bensrhair, Amnir Hadachi,
Abstract要約: 知識蒸留と混合はクラス境界における滑らかさの誘導に有効であることが証明されている。彼らの相互作用は、特に学生のトレーニングでのみミキシングが適用される場合、よく理解されていない。このミスマッチは,教師の指導信号が分散的混乱に支配されていることを示す。
参考スコア（独自算出の注目度）: 4.672326975246762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge Distillation (KD) and mixup have proven effective at inducing smoothness in class boundaries; KD captures inherent class relationships in probability distributions, and mixup enforces them through convex combinations of inputs. Their interaction, however, remains poorly understood, particularly when mixup is applied only during student training. In this setting, the teacher is queried on inputs drawn from a vicinal distribution it never saw during training, a controlled mismatch whose effect on knowledge transfer has not been characterised. We show that this mismatch causes the teacher's supervisory signal to be dominated by distributional confusion rather than inter-class structure. Despite it, the student does not merely imitate the teacher: it independently acquires greater linearity in the vicinal region, a structural property that the teacher lacks, and goes beyond dark-knowledge transfer. KD with mixup consistently improves student accuracy and reduces overconfidence by an order of magnitude relative to the baseline, across CIFAR and ImageNet with varying-capacity teachers. Crucially, calibration propagates from teacher to student independently of accuracy transfer, and temperature scaling governs a measurable accuracy-calibration trade-off that becomes more pronounced under vicinal training. These results reframe mixup distillation not as a degraded version of standard KD, but as a richer transfer channel that simultaneously shapes discriminative performance, uncertainty estimation, and representational geometry.
Abstract（参考訳）: 知識蒸留(KD)とミックスアップ(mixup)は、クラス境界における滑らかさの誘導に有効であることが証明されており、KDは確率分布における固有のクラス関係を捉え、ミックスアップは入力の凸結合を通じてそれらを強制する。しかし、特に学生のトレーニングでのみミキシングが適用される場合、それらの相互作用は理解されていない。この設定では、教師は、訓練中に見たことのない内臓分布から引き出された入力に基づいて、知識伝達に影響を及ぼさない制御ミスマッチを問い合わせる。このミスマッチは,教師の指導信号がクラス間構造よりも分散的混乱に支配されていることを示す。それにもかかわらず、生徒は単に教師を模倣するだけでなく、教師が欠く構造的特性であるヴィジナル領域において、独立してより大きな線形性を取得する。ミックスアップによるKDは、学生の精度を一貫して改善し、CIFARとImageNetの様々な能力を持つ教師による、ベースラインに対して桁違いに自信を増す。重要なことは、校正は教師から生徒への精度の伝達とは無関係に伝播し、温度のスケーリングは測定可能な精度の校正トレードオフを司り、市民の訓練ではより顕著になる。これらの結果は, 標準KDの劣化版ではなく, 識別性能, 不確実性評価, 表現幾何学を同時に形成するリッチトランスファーチャネルとして再編成された。

論文の概要: Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions

関連論文リスト