Fugu-MT 論文翻訳(概要): CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models

論文の概要: CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models

arxiv url: http://arxiv.org/abs/2505.22017v1
Date: Wed, 28 May 2025 06:24:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-29 17:35:50.444674
Title: CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models
Title（参考訳）: CoThink: 推論モデルを支援するインストラクションモデルによるToken-Efficient Reasoning
Authors: Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, Aixin Sun,
Abstract要約: 大規模言語モデル(LLM)は、テスト時間スケーリングとして知られる、テスト時間計算の増加の恩恵を受ける。しかし、推論最適化モデルはしばしば単純な問題さえ考え過ぎ、過度に冗長な出力を生成し、トークン効率を低下させる。 1)強化学習は前方推論の情報密度を減少させ,(2)後方連鎖学習は冗長でしばしば不要な検証ステップを促進する。
参考スコア（独自算出の注目度）: 56.40065909544213
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) benefit from increased test-time compute, a phenomenon known as test-time scaling. However, reasoning-optimized models often overthink even simple problems, producing excessively verbose outputs and leading to low token efficiency. By comparing these models with equally sized instruct models, we identify two key causes of this verbosity: (1) reinforcement learning reduces the information density of forward reasoning, and (2) backward chain-of thought training encourages redundant and often unnecessary verification steps. Since LLMs cannot assess the difficulty of a given problem, they tend to apply the same cautious reasoning strategy across all tasks, resulting in inefficient overthinking. To address this, we propose CoThink, an embarrassingly simple pipeline: an instruct model first drafts a high-level solution outline; a reasoning model then works out the solution. We observe that CoThink enables dynamic adjustment of reasoning depth based on input difficulty. Evaluated with three reasoning models DAPO, DeepSeek-R1, and QwQ on three datasets GSM8K, MATH500, and AIME24, CoThink reduces total token generation by 22.3% while maintaining pass@1 accuracy within a 0.42% margin on average. With reference to the instruct model, we formally define reasoning efficiency and observe a potential reasoning efficiency scaling law in LLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、テスト時間スケーリングとして知られる、テスト時間計算の増加の恩恵を受ける。しかし、推論最適化モデルはしばしば単純な問題さえ考え過ぎ、過度に冗長な出力を生成し、トークン効率を低下させる。これらのモデルを等サイズの命令モデルと比較することにより、(1)強化学習は前方推論の情報密度を減少させ、(2)後方連鎖学習は冗長でしばしば不要な検証ステップを促進する。 LLMは与えられた問題の難しさを評価できないため、全てのタスクに同じ慎重な推論戦略を適用する傾向があり、非効率な過度な考えをもたらす。インストラクションモデルが最初に高レベルのソリューション概要をドラフトし、推論モデルがソリューションを出力する、という恥ずかしい単純なパイプラインであるCoThinkを提案する。 CoThinkは入力難易度に基づいて推論深度を動的に調整できる。 3つの推論モデルDAPO、DeepSeek-R1、QwQをGSM8K、MATH500、AIME24の3つのデータセットで評価し、CoThinkは平均0.42%のマージンでパス@1の精度を維持しながら、トークン全体の生成を22.3%削減する。インストラクタモデルに言及して、推論効率を正式に定義し、LLMにおける潜在的推論効率スケーリング則を観察する。

論文の概要: CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models

関連論文リスト