Fugu-MT 論文翻訳(概要): Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers

論文の概要: Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers

arxiv url: http://arxiv.org/abs/2511.06848v2
Date: Sat, 15 Nov 2025 16:34:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-18 14:36:22.082618
Title: Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers
Title（参考訳）: 蒸留ダイナミクス:視覚変換器における特徴に基づく蒸留の理解に向けて
Authors: Huiyuan Tian, Bonan Xu, Shijian Li,
Abstract要約: 蒸留力学」と呼ばれる新しい分析枠組みを通じて、この現象を包括的に分析する。特徴蒸留における負の伝達の根本原因を,教師と学生のモデル間の基本的な表現パラダイムのミスマッチと同定する。この結果から,ViTsにおける知識伝達の成功には,これらの基本的表現制約を尊重する手法へのナイーブな特徴模倣を超越する必要があることが明らかとなった。
参考スコア（独自算出の注目度）: 4.712287472749922
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While feature-based knowledge distillation has proven highly effective for compressing CNNs, these techniques unexpectedly fail when applied to Vision Transformers (ViTs), often performing worse than simple logit-based distillation. We provide the first comprehensive analysis of this phenomenon through a novel analytical framework termed as "distillation dynamics", combining frequency spectrum analysis, information entropy metrics, and activation magnitude tracking. Our investigation reveals that ViTs exhibit a distinctive U-shaped information processing pattern: initial compression followed by expansion. We identify the root cause of negative transfer in feature distillation: a fundamental representational paradigm mismatch between teacher and student models. Through frequency-domain analysis, we show that teacher models employ distributed, high-dimensional encoding strategies in later layers that smaller student models cannot replicate due to limited channel capacity. This mismatch causes late-layer feature alignment to actively harm student performance. Our findings reveal that successful knowledge transfer in ViTs requires moving beyond naive feature mimicry to methods that respect these fundamental representational constraints, providing essential theoretical guidance for designing effective ViTs compression strategies. All source code and experimental logs are provided at https://github.com/thy960112/Distillation-Dynamics.
Abstract（参考訳）: 特徴に基づく知識蒸留はCNNを圧縮するのに非常に効果的であることが証明されているが、これらの技術は視覚変換器(ViT)に適用すると予期せず失敗する。本稿では、周波数スペクトル分析、情報エントロピーメトリクス、アクティベーション・マグニチュード・トラッキングを組み合わせた「蒸留ダイナミクス」と呼ばれる新しい分析フレームワークを通じて、この現象を包括的に分析する。調査の結果,ViTsはU字型情報処理パターンとして,初期圧縮と拡張が特徴的であることが明らかとなった。特徴蒸留における負の伝達の根本原因を,教師と学生のモデル間の基本的な表現パラダイムのミスマッチと同定する。周波数領域解析により、教師モデルは後層に分散された高次元符号化戦略を用いており、小学生モデルではチャネル容量の制限により複製できないことを示す。このミスマッチは、後期機能のアライメントを引き起こし、生徒のパフォーマンスを積極的に損なう。この結果から,ViTsにおける知識伝達の成功には,これらの基本的表現的制約を尊重する手法へのナイーブな模倣を超えて,有効なViTs圧縮戦略を設計するための基本的な理論的ガイダンスを提供する必要があることが明らかとなった。すべてのソースコードと実験ログはhttps://github.com/thy960112/Distillation-Dynamicsで提供されている。

論文の概要: Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers

関連論文リスト