Fugu-MT 論文翻訳(概要): MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models

論文の概要: MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models

arxiv url: http://arxiv.org/abs/2603.13213v1
Date: Fri, 13 Mar 2026 17:49:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:12.230418
Title: MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models
Title（参考訳）: MoEKD:ロバスト・高性能圧縮符号モデルのための知識混合蒸留
Authors: Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy,
Abstract要約: コードのための大規模な言語モデルは、多様なソフトウェア分析タスクで強力なパフォーマンスを達成した。知識蒸留(KD)は、大きなモデルからより小さくより効率的なモデルに知識を移すことによって、実用的なソリューションを提供する。 MoEKDは蒸留過程をエキスパートとルータの訓練、学習されたルーティング機構による専門家の知識の集約、集約された知識からの蒸留に分解する。
参考スコア（独自算出の注目度）: 6.25009626782699
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains limited by high computational demands, slow inference speeds, significant energy consumption, and environmental impact. Knowledge distillation (KD) offers a practical solution by transferring knowledge from a large model to a smaller and more efficient model. Despite its effectiveness, recent studies show that models distilled from a single source often exhibit degraded adversarial robustness, even when robustness-aware distillation techniques are employed. These observations suggest a fundamental limitation of single-source distillation in simultaneously transferring high-quality and robust knowledge. To overcome this limitation, we propose Mixture of Experts Knowledge Distillation (MoEKD), a KD framework that leverages a Mixture of Experts (MoE) architecture to enable more effective and robust knowledge transfer from multiple specialized experts into a compact model. MoEKD decomposes the distillation process into expert and router training, aggregation of expert knowledge through a learned routing mechanism, and distillation from the aggregated knowledge. We evaluate MoEKD on the vulnerability detection task using CodeBERT and GraphCodeBERT models. Experimental results show that MoEKD not only improves adversarial robustness by up to 35.8%, but also enhances predictive performance by up to 13%, compared to state-of-the-art KD baselines, including Compressor and AVATAR. Furthermore, an ablation study demonstrates that aggregating expert knowledge enables ultra-compact models to maintain competitive performance even when their size is reduced by approximately half. Overall, these results highlight the effectiveness of multi-expert knowledge aggregation in addressing key limitations of existing single-source KD approaches.
Abstract（参考訳）: コードのための大規模な言語モデルは、多様なソフトウェア分析タスクで強力なパフォーマンスを達成したが、実際の採用は高い計算要求、遅い推論速度、かなりのエネルギー消費、環境への影響によって制限されている。知識蒸留(KD)は、大きなモデルからより小さくより効率的なモデルに知識を移すことによって、実用的なソリューションを提供する。有効性にもかかわらず、最近の研究では、単一のソースから蒸留したモデルは、ロバスト性に配慮した蒸留技術を用いた場合であっても、しばしば劣化した対向性を示すことが示されている。これらの観察は、高品質で堅牢な知識を同時に伝達する際の単一ソース蒸留の基本的な限界を示唆している。この制限を克服するために,Mixture of Experts Knowledge Distillation (MoEKD) という,Mixture of Experts (MoE) アーキテクチャを活用するKDフレームワークを提案する。 MoEKDは蒸留過程をエキスパートとルータの訓練、学習されたルーティング機構による専門家の知識の集約、集約された知識からの蒸留に分解する。 CodeBERTモデルとGraphCodeBERTモデルを用いて,脆弱性検出タスクのMoEKDを評価する。実験結果から,MoEKDは対向ロバスト性を最大35.8%向上するだけでなく,圧縮機やAVATARなどの最先端KDベースラインと比較して予測性能を最大13%向上させることがわかった。さらにアブレーション研究では、専門家の知識を集約することで、そのサイズを約半分に減らしても、超コンパクトモデルが競争性能を維持することができることを示した。これらの結果は、既存のシングルソースKDアプローチの鍵となる限界に対処する上で、マルチエキスパートな知識集約の有効性を浮き彫りにした。

論文の概要: MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models

関連論文リスト