Fugu-MT 論文翻訳(概要): Structural Rationale Distillation via Reasoning Space Compression

論文の概要: Structural Rationale Distillation via Reasoning Space Compression

arxiv url: http://arxiv.org/abs/2605.07139v1
Date: Fri, 08 May 2026 02:15:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.743781
Title: Structural Rationale Distillation via Reasoning Space Compression
Title（参考訳）: 宇宙圧縮反応による構造Rationale蒸留
Authors: Jialin Yang, Jiankun Wang, Jiajun Wu, Henry Leung, Jiayu Zhou, Steve Drew,
Abstract要約: 推論パス圧縮(Reasoning Path Compression)は、教師が再利用可能な高レベル推論パスのコンパクトで動的に維持されたバンクに従うことを制約する。各トレーニング質問に対して、D-RPCは教師が従うべき最も関連性の高いパスと条件を検索し、類似した問題に一貫性があり、異なる問題タイプをカバーするのに十分な多様な合理性を生み出す。
参考スコア（独自算出の注目度）: 34.91106623292321
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When distilling reasoning from large language models (LLMs) into smaller ones, teacher rationales for similar problems often vary wildly in structure and strategy. Like a chef who makes the same dish differently each time, this inconsistency burdens the student with noisy supervision that is hard to internalize. We propose Distillation through Reasoning Path Compression (D-RPC), which constrains the teacher to follow a compact, dynamically maintained bank of reusable high-level reasoning paths. For each training question, D-RPC retrieves the most relevant path and conditions the teacher to follow it, producing rationales that are consistent across similar problems yet diverse enough to cover different problem types. A PAC-Bayes analysis formalizes the resulting trade-off between bank size and coverage: smaller banks reduce supervision entropy but risk coverage gaps, and the generalization bound identifies an optimal intermediate size confirmed by our ablations. Across five math and commonsense reasoning benchmarks with two student models, D-RPC consistently outperforms chain-of-thought distillation, freeform rationale generation, direct distillation, and structured-supervision baselines, while using fewer tokens than template-heavy alternatives.
Abstract（参考訳）: 大きな言語モデル(LLM)からより小さな言語に推論を蒸留する場合、同様の問題に対する教師の合理性は構造や戦略において大きく異なることが多い。毎回同じ料理を作るシェフのように、この不整合は内密化が難しい騒々しい監督を学生に負担する。本稿では,教師が再利用可能な高レベル推論経路のコンパクトで動的に維持されたバンクに従うことを制約するReasoning Path Compression (D-RPC)による蒸留を提案する。各トレーニング質問に対して、D-RPCは教師が従うべき最も関連性の高いパスと条件を検索し、類似した問題に一貫性があるが、異なる問題タイプをカバーするのに十分な多様な合理性を生成する。 PAC-Bayes分析は、銀行規模とカバー範囲のトレードオフを形式化し、より小さな銀行は監督エントロピーを減らし、リスクカバレッジのギャップを減らし、一般化バウンダリは、当社の廃止によって確認された最適な中間サイズを特定する。 5つの数学および常識推論ベンチマークと2つの学生モデルにおいて、D-RPCは、テンプレート重の代替品よりも少ないトークンを使用しながら、チェーン・オブ・シンクの蒸留、フリーフォームの合理化、直接蒸留、構造化されたスーパービジョンのベースラインを一貫して上回っている。

論文の概要: Structural Rationale Distillation via Reasoning Space Compression

関連論文リスト