Fugu-MT 論文翻訳(概要): An Empirical Study of Knowledge Distillation for Code Understanding Tasks

論文の概要: An Empirical Study of Knowledge Distillation for Code Understanding Tasks

arxiv url: http://arxiv.org/abs/2508.15423v1
Date: Thu, 21 Aug 2025 10:24:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-22 16:26:46.278559
Title: An Empirical Study of Knowledge Distillation for Code Understanding Tasks
Title（参考訳）: コード理解課題における知識蒸留の実証的研究
Authors: Ruiqi Wang, Zezhou Yang, Cuiyun Gao, Xin Xia, Qing Liao,
Abstract要約: 知識蒸留(KD)は、大きな教師モデルからコンパクトな学生モデルに知識を移すことによって制限に対処する。本稿では,コード理解タスクにおけるKDの有効性と使用法を体系的に検討する。
参考スコア（独自算出の注目度）: 19.64130505527951
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained language models (PLMs) have emerged as powerful tools for code understanding. However, deploying these PLMs in large-scale applications faces practical challenges due to their computational intensity and inference latency. Knowledge distillation (KD), a promising model compression and acceleration technique, addresses these limitations by transferring knowledge from large teacher models to compact student models, enabling efficient inference while preserving most of the teacher models' capabilities. While this technique has shown remarkable success in natural language processing and computer vision domains, its potential for code understanding tasks remains largely underexplored. In this paper, we systematically investigate the effectiveness and usage of KD in code understanding tasks. Our study encompasses two popular types of KD methods, i.e., logit-based and feature-based KD methods, experimenting across eight student models and two teacher PLMs from different domains on three downstream tasks. The experimental results indicate that KD consistently offers notable performance boosts across student models with different sizes compared with standard fine-tuning. Notably, code-specific PLM demonstrates better effectiveness as the teacher model. Among all KD methods, the latest feature-based KD methods exhibit superior performance, enabling student models to retain up to 98% teacher performance with merely 5% parameters. Regarding student architecture, our experiments reveal that similarity with teacher architecture does not necessarily lead to better performance. We further discuss the efficiency and behaviors in the KD process and inference, summarize the implications of findings, and identify promising future directions.
Abstract（参考訳）: 事前訓練された言語モデル(PLM)は、コード理解のための強力なツールとして登場した。しかし、これらのPLMを大規模アプリケーションにデプロイすることは、計算強度と推論遅延のために現実的な課題に直面している。有望なモデル圧縮および加速技術である知識蒸留(KD)は、大きな教師モデルからコンパクトな学生モデルに知識を移すことによってこれらの制限に対処し、教師モデルの能力の大部分を保ちながら効率的な推論を可能にする。この技術は自然言語処理やコンピュータビジョン領域で顕著に成功したが、コード理解タスクの可能性はほとんど未解明のままである。本稿では,コード理解タスクにおけるKDの有効性と使用法を体系的に検討する。本研究は,2種類のKD手法,すなわちロジットに基づくKD法と特徴に基づくKD法を包含する。実験結果から,KDは標準の微調整と比較して,学生モデル全体の顕著な性能向上を実現していることが明らかとなった。特に、コード固有のPLMは教師モデルよりも優れた効果を示す。すべてのKD手法の中で、最新の機能ベースのKD手法は優れた性能を示し、生徒モデルは5%のパラメータで最大98%のパフォーマンスを維持することができる。学生アーキテクチャに関して、我々の実験は、教師アーキテクチャと類似性が必ずしもより良いパフォーマンスをもたらすとは限らないことを明らかにした。さらに、KDプロセスの効率性と振る舞いについて考察し、結果の意味を要約し、将来有望な方向性を特定する。

論文の概要: An Empirical Study of Knowledge Distillation for Code Understanding Tasks

関連論文リスト