Fugu-MT 論文翻訳(概要): Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach

論文の概要: Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach

arxiv url: http://arxiv.org/abs/2509.07820v1
Date: Tue, 09 Sep 2025 14:57:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-10 14:38:27.367909
Title: Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach
Title（参考訳）: 大規模言語モデルにおける確実性誘導推論:動的思考予算アプローチ
Authors: João Paulo Nogueira, Wentao Sun, Alonso Silva, Laith Zumot,
Abstract要約: CGR(Certainty-Guided Reasoning)はトークン使用量を削減するとともに,ベースライン精度を向上させる。 CGRは、確実なしきい値と効率の間の調整可能なトレードオフによって、数百万のトークンを集約的に排除することができる。信頼性を推論プロセスに統合することにより、CGRは大きな推論言語モデルをより適応的で信頼性があり、リソース効率が良いものにする。
参考スコア（独自算出の注目度）: 0.15749416770494704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rise of large reasoning language models (LRLMs) has unlocked new potential for solving complex tasks. These models operate with a thinking budget, that is, a predefined number of reasoning tokens used to arrive at a solution. We propose a novel approach, inspired by the generator/discriminator framework in generative adversarial networks, in which a critic model periodically probes its own reasoning to assess whether it has reached a confident conclusion. If not, reasoning continues until a target certainty threshold is met. This mechanism adaptively balances efficiency and reliability by allowing early termination when confidence is high, while encouraging further reasoning when uncertainty persists. Through experiments on the AIME2024 and AIME2025 datasets, we show that Certainty-Guided Reasoning (CGR) improves baseline accuracy while reducing token usage. Importantly, extended multi-seed evaluations over 64 runs demonstrate that CGR is stable, reducing variance across seeds and improving exam-like performance under penalty-based grading. Additionally, our token savings analysis shows that CGR can eliminate millions of tokens in aggregate, with tunable trade-offs between certainty thresholds and efficiency. Together, these findings highlight certainty as a powerful signal for reasoning sufficiency. By integrating confidence into the reasoning process, CGR makes large reasoning language models more adaptive, trustworthy, and resource efficient, paving the way for practical deployment in domains where both accuracy and computational cost matter.
Abstract（参考訳）: 大きな推論言語モデル(LRLM)の台頭は、複雑なタスクを解く新しい可能性の解放に繋がった。これらのモデルは思考予算、すなわち、ソリューションに到達するのに使用される推論トークンの数で機能する。本稿では,生成的敵ネットワークにおけるジェネレータ/ディスクリミネータの枠組みに着想を得た新たなアプローチを提案する。もしそうでなければ、推論は目標の確実性しきい値が満たされるまで続く。この機構は、信頼性が高い場合に早期終了を許容し、不確実性が持続する場合にはさらなる推論を奨励することにより、効率と信頼性を適応的にバランスさせる。 AIME2024とAIME2025データセットの実験を通じて、CGR(Certainty-Guided Reasoning)がトークン使用率を低減しつつ、ベースライン精度を向上させることを示す。重要な点として、64回以上の多系統評価では、CGRは安定であり、種子間のばらつきを低減し、ペナルティベースの格付けによる試験のような性能を向上させることが示されている。さらに、トークン保存分析により、CGRは、確実なしきい値と効率の間の調整可能なトレードオフを伴って、数百万のトークンを集約的に排除できることを示す。これらの知見はともに、十分推論するための強力なシグナルとして確実性を強調している。信頼性を推論プロセスに統合することにより、CGRは大規模推論言語モデルをより適応的で信頼性が高く、リソース効率が良くし、精度と計算コストの両方が問題となる領域での実践的な展開の道を開く。

論文の概要: Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach

関連論文リスト