Fugu-MT 論文翻訳(概要): Teaching Language Models to Think in Code

論文の概要: Teaching Language Models to Think in Code

arxiv url: http://arxiv.org/abs/2605.07237v2
Date: Mon, 11 May 2026 02:57:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 19:24:01.346717
Title: Teaching Language Models to Think in Code
Title（参考訳）: コードで考えるための言語モデルを教える
Authors: Hyeon Hwang, Jiwoo Lee, Jaewoo Kang,
Abstract要約: 我々は、コード自体がNLによって起動されるツールとしてではなく、推論として機能するフレームワークであるThinking in Code(Thinking in Code)を提案する。教師モデルから12.2kのコード中心軌跡を抽出し、教師による微調整と強化学習によりThinC-1.7BとThinC-4Bを訓練する。
参考スコア（独自算出の注目度）: 18.87981500987763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather than as a tool invoked by NL. A ThinC trajectory begins with a brief NL planning step, after which all reasoning unfolds through code blocks connected only by their execution outputs. We distill 12.2k code-centric trajectories from a teacher model and train ThinC-1.7B and ThinC-4B with supervised fine-tuning followed by reinforcement learning. ThinC-4B consistently outperforms every TIR baseline on five competition-level math benchmarks and even surpasses the much larger Qwen3-235B-A22B-Thinking. Further analysis shows that ThinC reasons through code: 99.2% of its final answers are grounded in interpreter output, and the model recovers reliably from code execution failures without intermediate NL reasoning. Our code and models will be released soon.
Abstract（参考訳）: ツール統合推論(TIR)は、自然言語推論(NL)とコード実行を組み合わせた言語モデルにおける数学的問題解決の主流パラダイムとして登場した。コードはしばしばポストホック検証として機能し、中間的なNL計算はエラーを起こし、NLとコードは明らかに異なる役割ではなく重なり合う。我々は、コード自体がNLによって起動されるツールとしてではなく、推論として機能するフレームワークであるThinking in Code(Thinking in Code)を提案する。 ThinCトラジェクトリは、簡単なNL計画ステップから始まり、その後、すべての推論が実行出力によってのみ接続されるコードブロックを通して展開される。教師モデルから12.2kのコード中心軌跡を抽出し、教師による微調整と強化学習によりThinC-1.7BとThinC-4Bを訓練する。 ThinC-4Bは5つの競合レベルのベンチマークでTIRベースラインを一貫して上回り、さらに大きなQwen3-235B-A22B-Thinkingを上回っている。 99.2%はインタプリタの出力に基づいており、そのモデルは中間的なNL推論なしでコード実行失敗から確実に回復する。コードとモデルも間もなくリリースされる予定です。

論文の概要: Teaching Language Models to Think in Code

関連論文リスト