Fugu-MT 論文翻訳(概要): Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

論文の概要: Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

arxiv url: http://arxiv.org/abs/2506.04723v1
Date: Thu, 05 Jun 2025 07:53:59 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-06 21:53:49.593427
Title: Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Title（参考訳）: 強化学習におけるLLMの数学的推論
Authors: Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, Frederic Sala,
Abstract要約: 本稿では,強化学習が推論に与える影響を明らかにするための,きめ細かい分析フレームワークを提案する。本フレームワークは,RLトレーニングの恩恵を受けると仮定された重要な要素を具体的に調査する。
参考スコア（独自算出の注目度）: 82.43575191712726
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has become the dominant paradigm for endowing language models with advanced reasoning capabilities. Despite the substantial empirical gains demonstrated by RL-based training methods like GRPO, a granular understanding of their advantages is still lacking. To address this gap, we introduce a fine-grained analytic framework to dissect the impact of RL on reasoning. Our framework specifically investigates key elements that have been hypothesized to benefit from RL training: (1) plan-following and execution, (2) problem decomposition, and (3) improved reasoning and knowledge utilization. Using this framework, we gain insights beyond mere accuracy. For instance, providing models with explicit step-by-step plans surprisingly degrades performance on the most challenging benchmarks, yet RL-tuned models exhibit greater robustness, experiencing markedly smaller performance drops than their base counterparts. This suggests that RL may not primarily enhance the execution of external plans but rather empower models to formulate and follow internal strategies better suited to their reasoning processes. Conversely, we observe that RL enhances the model's capacity to integrate provided knowledge into its reasoning process, leading to performance improvements across diverse tasks. We also study difficulty, showing improved training by developing new ways to exploit hard problems. Our findings lay a foundation for more principled training and evaluation of reasoning models.
Abstract（参考訳）: 強化学習(Reinforcement Learning, RL)は、先進的な推論能力を持つ言語モデルの実現において、主要なパラダイムとなっている。 GRPOのようなRLベースのトレーニング手法によって実証されたかなりの成果にもかかわらず、それらの利点の詳細な理解はいまだに不足している。このギャップに対処するために、RLが推論に与える影響を識別するためのきめ細かい分析フレームワークを導入する。本フレームワークは,RLトレーニングの恩恵を受けると仮定された重要な要素について,(1)計画追従と実行,(2)問題分解,(3)推論と知識利用の改善について検討する。このフレームワークを使うことで、単なる正確さ以上の洞察を得ることができます。例えば、明確なステップバイステップの計画を持つモデルを提供することで、最も難しいベンチマークでは驚くほどパフォーマンスが低下します。このことは、RLが主に外部計画の実行を強化するのではなく、モデルが推論プロセスに適した内部戦略を定式化し、従う力を与えることを示唆している。逆に、RLは与えられた知識を推論プロセスに統合する能力を高め、様々なタスクにおけるパフォーマンス改善につながります。難易度も研究し、難易度を活かす新しい方法を開発することにより、トレーニングの改善を示す。本研究は、推論モデルのより原則的なトレーニングと評価の基礎を築いた。

論文の概要: Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

関連論文リスト