Fugu-MT 論文翻訳(概要): Teaching Language Models to Reason with Tools

論文の概要: Teaching Language Models to Reason with Tools

arxiv url: http://arxiv.org/abs/2510.20342v1
Date: Thu, 23 Oct 2025 08:41:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:17.628855
Title: Teaching Language Models to Reason with Tools
Title（参考訳）: ツールで推論する言語モデルを教える
Authors: Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu,
Abstract要約: emphHint-Engineeringは、推論経路内の最適点に様々なヒントを戦略的に注入する新しいデータ合成戦略である。 CoRTは効率を大幅に向上させ、32Bモデルのトークン使用量を約30%削減し、1.5Bモデルのトークン使用量を50%削減した。
参考スコア（独自算出の注目度）: 73.21700643314917
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large reasoning models (LRMs) like OpenAI-o1 have shown impressive capabilities in natural language reasoning. However, these models frequently demonstrate inefficiencies or inaccuracies when tackling complex mathematical operations. While integrating computational tools such as Code Interpreters (CIs) offers a promising solution, it introduces a critical challenge: a conflict between the model's internal, probabilistic reasoning and the external, deterministic knowledge provided by the CI, which often leads models to unproductive deliberation. To overcome this, we introduce CoRT (Code-Optimized Reasoning Training), a post-training framework designed to teach LRMs to effectively utilize CIs. We propose \emph{Hint-Engineering}, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths. This approach generates high-quality, code-integrated reasoning data specifically tailored to optimize LRM-CI interaction. Using this method, we have synthesized 30 high-quality samples to post-train models ranging from 1.5B to 32B parameters through supervised fine-tuning. CoRT further refines the multi-round interleaving of external CI usage and internal thinking by employing rejection sampling and reinforcement learning. Our experimental evaluations demonstrate CoRT's effectiveness, yielding absolute improvements of 4\% and 8\% on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B, respectively, across five challenging mathematical reasoning datasets. Moreover, CoRT significantly enhances efficiency, reducing token usage by approximately 30\% for the 32B model and 50\% for the 1.5B model compared to pure natural language reasoning baselines. The models and code are available at: https://github.com/ChengpengLi1003/CoRT.
Abstract（参考訳）: OpenAI-o1のような大きな推論モデル(LRM)は、自然言語推論において印象的な能力を示している。しかし、これらのモデルは複雑な数学的操作に取り組む際の非効率さや不正確さをしばしば示している。 Code Interpreters(CI)のような計算ツールの統合は、有望なソリューションを提供する一方で、モデルの内部的、確率的推論と、CIが提供する外部的、決定論的知識との間の対立という、モデルが非生産的な熟考へと導くという、重要な課題を導入します。これを解決するために、私たちは、CIを効果的に活用するためのLEMを教えるために設計されたポストトレーニングフレームワークであるCoRT(Code-Optimized Reasoning Training)を紹介します。推論経路内の最適点に様々なヒントを戦略的に注入する新しいデータ合成戦略である「emph{Hint-Engineering}」を提案する。このアプローチは、LEM-CIインタラクションを最適化するために特別に調整された高品質なコード統合推論データを生成する。本手法を用いて, 1.5B から 32B まで, 教師付き微調整により, 30 個の高品質サンプルを合成した。 CoRTは、リジェクションサンプリングと強化学習を利用することで、外部CI使用と内部思考のマルチラウンドインターリーブをさらに改善する。実験により,CoRTの有効性が実証され,DeepSeek-R1-Distill-Qwen-32BとDeepSeek-R1-Distill-Qwen-1.5Bの絶対的な改善が得られた。さらに、CoRTは効率を大幅に向上させ、純粋な自然言語推論ベースラインと比較して、32Bモデルで約30倍、1.5Bモデルで約50倍のトークン使用量を削減した。モデルとコードは、https://github.com/ChengpengLi1003/CoRT.comで入手できる。

論文の概要: Teaching Language Models to Reason with Tools

関連論文リスト