Fugu-MT 論文翻訳(概要): THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

論文の概要: THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

arxiv url: http://arxiv.org/abs/2509.13761v2
Date: Fri, 03 Oct 2025 12:48:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:51.980412
Title: THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Title（参考訳）: THOR:数理推論のためのRLによるツール付き階層最適化
Authors: Qikai Chang, Zhenrong Zhang, Pengfei Hu, Jun Du, Jiefeng Ma, Yicheng Pan, Jianshu Zhang, Quan Liu, Jianqing Gao,
Abstract要約: 大規模言語モデル (LLM) は数学的推論において顕著な進歩を遂げた。最近の進歩にもかかわらず、既存の手法は3つの重要な課題に直面している。我々はこれらの制限を克服するためにTHOR(Tool-Integrated Hierarchical Optimization via RL)を提案する。提案手法は多種多様なモデルに対して強い一般化を示し,推論モデルと非推論モデルの両方で効果的に機能する。
参考スコア（独自算出の注目度）: 25.605096023894834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have made remarkable progress in mathematical reasoning, but still continue to struggle with high-precision tasks like numerical computation and formal symbolic manipulation. Integrating external tools has emerged as a promising approach to bridge this gap. Despite recent advances, existing methods struggle with three key challenges: constructing tool-integrated reasoning data, performing fine-grained optimization, and enhancing inference. To overcome these limitations, we propose THOR (Tool-Integrated Hierarchical Optimization via RL). First, we introduce TIRGen, a multi-agent actor-critic-based pipeline for constructing high-quality datasets of tool-integrated reasoning paths, aligning with the policy and generalizing well across diverse models. Second, to perform fine-grained hierarchical optimization, we introduce an RL strategy that jointly optimizes for both episode-level problem solving and step-level code generation. This is motivated by our key insight that the success of an intermediate tool call is a strong predictor of the final answer's correctness. Finally, THOR incorporates a self-correction mechanism that leverages immediate tool feedback to dynamically revise erroneous reasoning paths during inference. Our approach demonstrates strong generalization across diverse models, performing effectively in both reasoning and non-reasoning models. It further achieves state-of-the-art performance for models of a similar scale on multiple mathematical benchmarks, while also delivering consistent improvements on code benchmarks. Our code will be publicly available at https://github.com/JingMog/THOR.
Abstract（参考訳）: 大規模言語モデル(LLM)は数学的推論において顕著な進歩を遂げてきたが、数値計算や形式的記号操作のような高精度なタスクに苦戦し続けている。外部ツールの統合は、このギャップを埋めるための有望なアプローチとして現れました。近年の進歩にもかかわらず、既存の手法では、ツール統合推論データの構築、きめ細かい最適化の実行、推論の強化という3つの大きな課題に直面している。これらの制限を克服するため,THOR (Tool-Integrated Hierarchical Optimization via RL)を提案する。まず、ツール統合推論パスの高品質なデータセットを構築し、ポリシーと整合し、多様なモデルにまたがって適切に一般化するための、マルチエージェントアクタ批判ベースのパイプラインであるTIRGenを紹介する。第二に、細粒度階層最適化を実現するために、エピソードレベルの問題解決とステップレベルのコード生成を共同で最適化するRL戦略を導入する。これは、中間的なツールコールの成功が最終回答の正しさの強い予測要因であるという私たちの重要な洞察に動機付けられています。最後に、THORは、即時ツールフィードバックを利用して推論中に誤った推論経路を動的に修正する自己補正機構を組み込んでいる。提案手法は多種多様なモデルに対して強い一般化を示し,推論モデルと非推論モデルの両方で効果的に機能する。さらに、複数の数学的ベンチマークで同様のスケールのモデルの最先端のパフォーマンスを達成し、コードベンチマークで一貫した改善を提供する。私たちのコードはhttps://github.com/JingMog/THOR.comで公開されます。

論文の概要: THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

関連論文リスト