Fugu-MT 論文翻訳(概要): Hybrid Policy Distillation for LLMs

論文の概要: Hybrid Policy Distillation for LLMs

arxiv url: http://arxiv.org/abs/2604.20244v1
Date: Wed, 22 Apr 2026 06:46:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.996194
Title: Hybrid Policy Distillation for LLMs
Title（参考訳）: LLMのハイブリッド政策蒸留
Authors: Wenhong Zhu, Ruobing Xie, Rui Wang, Pengfei Liu,
Abstract要約: 知識蒸留(KD)は大規模言語モデル(LLM)を圧縮するための強力なパラダイムである既存のKD手法の設計を分解し、それらの相互接続を確立する統一的な視点を示す。我々は, モードカバレッジとモード探索のバランスをとるために, フォワードとリバースKLの相補的な利点を統合するハイブリッド政策蒸留(HPD)を提案する。
参考スコア（独自算出の注目度）: 40.69103815149454
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections between them, reformulating KD as a reweighted log-likelihood objective at the token level. We further propose Hybrid Policy Distillation (HPD), which integrates the complementary advantages of forward and reverse KL to balance mode coverage and mode-seeking, and combines off-policy data with lightweight, approximate on-policy sampling. We validate HPD on long-generation math reasoning as well as short-generation dialogue and code tasks, demonstrating improved optimization stability, computational efficiency, and final performance across diverse model families and scales. The code related to this work is available at https://github.com/zwhong714/Hybrid-Policy-Distillation.
Abstract（参考訳）: 知識蒸留(KD)は大きな言語モデル(LLM)を圧縮するための強力なパラダイムであり、その有効性は分岐方向、最適化戦略、データ体制の相互選択に依存する。我々は既存のKD手法の設計を分解し、それらの相互接続を確立する統一的な視点を示し、トークンレベルでのログライクな目的としてKDを再構成する。さらに、モードカバレッジとモード検索のバランスをとるために、前向きと逆向きのKLの相補的な利点を統合するハイブリッド政策蒸留(HPD)を提案し、オフポリティクスデータと軽量で近似的なオンポリティクスサンプリングを組み合わせる。我々は,多種多様なモデルファミリおよびスケールにおける最適化安定性,計算効率,最終性能の向上を実証し,HPDを長寿命の数学推論および短世代対話およびコードタスクで検証する。この作業に関連するコードはhttps://github.com/zwhong714/Hybrid-Policy-Distillationで公開されている。

論文の概要: Hybrid Policy Distillation for LLMs

関連論文リスト