Fugu-MT 論文翻訳(概要): BARD: budget-aware reasoning distillation

論文の概要: BARD: budget-aware reasoning distillation

arxiv url: http://arxiv.org/abs/2511.01470v1
Date: Mon, 03 Nov 2025 11:30:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:27.242666
Title: BARD: budget-aware reasoning distillation
Title（参考訳）: バード-予算に配慮した蒸留法
Authors: Lujie Niu, Lei Shen, Yi Jiang, Caixia Yuan, Xiaojie Wang, Wenbo Su, Bo zheng,
Abstract要約: ロング・チェーン・オブ・ソート (Long Chain-of-Thought, CoT) 蒸留は推論能力をより小さな言語モデルに効果的に伝達する。 bftextBudget-Aware Reasoning Distillation (BARD) を提案する。
参考スコア（独自算出の注目度）: 25.725960386304646
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: While long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models, the reasoning process often remains redundant and computational budget uncontrollable, leading to inefficient resource usage. To address this limitation, we propose \textbf{Budget-Aware Reasoning Distillation (BARD)}, a novel framework that simultaneously distills reasoning capability and enables fine-grained control over the reasoning length. BARD uses the thinking budget as a user-specified control signal, allowing the model to dynamically balance reasoning performance and computational efficiency. To achieve this concept, BARD introduces a two-phase training regimen. The first phase, Supervised Fine-Tuning (SFT) on teacher-generated long CoT data compressed to various budget levels, bootstrapping the model's understanding of budget constraints. The second phase leverages Reinforcement Learning (RL) from a reward signal in consideration of reasoning performance and budget fidelity simultaneously. Incorporating the two-phase regimen is crucial to avoiding policy degradation and ensuring that both objectives are optimized jointly. Extensive experiments demonstrate that our method empowers an 8B student model to achieve strong performance on challenging reasoning benchmarks (\textit{AIME24, AIME25, GPQA}) while providing precise and adaptive control over its reasoning length across a wide range of budgets.
Abstract（参考訳）: 長いチェーン・オブ・ソート(CoT)蒸留は推論能力をより小さな言語モデルに効果的に転送するが、推論プロセスは冗長であり、計算予算は制御不能であり、非効率な資源使用につながる。この制限に対処するために、推論能力を同時に蒸留し、推論長のきめ細かい制御を可能にする新しいフレームワークである「textbf{Budget-Aware Reasoning Distillation (BARD)」を提案する。 BARDは思考予算をユーザ指定制御信号として使用し、推論性能と計算効率を動的にバランスさせる。この概念を実現するため、BARDは2段階のトレーニングレギュレーションを導入した。第1フェーズでは教師が作成した長いCoTデータを様々な予算レベルに圧縮し、モデルの予算制約に対する理解をブートストラップした。第2フェーズでは、推論性能と予算忠実度を同時に考慮し、報酬信号から強化学習(RL)を利用する。二段階体制を組み込むことは、政策の悪化を回避し、両方の目的が共同で最適化されることを保証するために不可欠である。大規模な実験により,提案手法は8B の学生モデルに対して,様々な予算で推論長を精度よく適応的に制御しながら,挑戦的推論ベンチマーク (\textit{AIME24, AIME25, GPQA}) で高い性能を達成することができることを示した。

論文の概要: BARD: budget-aware reasoning distillation

関連論文リスト