Fugu-MT 論文翻訳(概要): Understanding the Effects of Projectors in Knowledge Distillation

論文の概要: Understanding the Effects of Projectors in Knowledge Distillation

arxiv url: http://arxiv.org/abs/2310.17183v1
Date: Thu, 26 Oct 2023 06:30:39 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-27 21:40:59.421038
Title: Understanding the Effects of Projectors in Knowledge Distillation
Title（参考訳）: 知識蒸留におけるプロジェクターの効果の理解
Authors: Yudong Chen, Sen Wang, Jiajun Liu, Xuwei Xu, Frank de Hoog, Brano Kusy, Zi Huang
Abstract要約: 学生と教師が同じ特徴次元を持つ場合でも、プロジェクターを追加することで蒸留性能が向上する。本稿では、プロジェクターが果たす暗黙の役割について検討するが、これまで見過ごされてきた。プロジェクターの正の効果に感化されて, プロジェクターアンサンブルを用いた特徴蒸留法を提案し, 蒸留性能をさらに向上させる。
参考スコア（独自算出の注目度）: 31.882356225974632
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventionally, during the knowledge distillation process (e.g. feature distillation), an additional projector is often required to perform feature transformation due to the dimension mismatch between the teacher and the student networks. Interestingly, we discovered that even if the student and the teacher have the same feature dimensions, adding a projector still helps to improve the distillation performance. In addition, projectors even improve logit distillation if we add them to the architecture too. Inspired by these surprising findings and the general lack of understanding of the projectors in the knowledge distillation process from existing literature, this paper investigates the implicit role that projectors play but so far have been overlooked. Our empirical study shows that the student with a projector (1) obtains a better trade-off between the training accuracy and the testing accuracy compared to the student without a projector when it has the same feature dimensions as the teacher, (2) better preserves its similarity to the teacher beyond shallow and numeric resemblance, from the view of Centered Kernel Alignment (CKA), and (3) avoids being over-confident as the teacher does at the testing phase. Motivated by the positive effects of projectors, we propose a projector ensemble-based feature distillation method to further improve distillation performance. Despite the simplicity of the proposed strategy, empirical results from the evaluation of classification tasks on benchmark datasets demonstrate the superior classification performance of our method on a broad range of teacher-student pairs and verify from the aspects of CKA and model calibration that the student's features are of improved quality with the projector ensemble design.
Abstract（参考訳）: 伝統的に、知識蒸留過程(例えば、特徴蒸留)において、教師と学生ネットワーク間の寸法ミスマッチによる特徴変換を行うには、追加のプロジェクタが必要となることが多い。興味深いことに、生徒と教師が同じ機能次元を持っていたとしても、プロジェクタを追加することで蒸留性能が向上できることがわかりました。さらに、プロジェクタは、アーキテクチャにそれらを追加することで、ロジット蒸留も改善します。これらの驚くべき発見と既存の文献からの知識蒸留プロセスにおけるプロジェクターの理解の欠如に着想を得て,プロジェクターが果たした暗黙的な役割について検討する。本研究は,(1)プロジェクタを持つ生徒が,プロジェクタを持たない生徒に比べて,プロジェクタを持たない生徒と比較して,訓練精度とテスト精度のトレードオフが良好であること,(2)教師との類似性が浅く数値的な類似性を超えて,センタード・カーネルアライメント(cka)の観点から保たれること,(3)試験段階において教師が自信過剰になることを避けること,の実証である。プロジェクタの正の効果に動機づけられ,さらに蒸留性能を向上させるために,プロジェクタアンサンブルを用いた特徴蒸留法を提案する。提案手法の単純さにもかかわらず、ベンチマークデータセットを用いた分類タスクの評価から、幅広い教師と学生のペアにおける手法の優れた分類性能を示し、CKAとモデル校正の側面から、学生の特徴がプロジェクタアンサンブル設計による品質改善であることを検証した。

論文の概要: Understanding the Effects of Projectors in Knowledge Distillation

関連論文リスト