Fugu-MT 論文翻訳(概要): What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

論文の概要: What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

arxiv url: http://arxiv.org/abs/2506.06998v1
Date: Sun, 08 Jun 2025 05:08:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-10 16:33:10.612371
Title: What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding
Title（参考訳）: 推論モデルの違いは何か? 効率的なデコーディングのための推論リーダーを追いかける
Authors: Ming Li, Zhengyuan Yang, Xiyao Wang, Dianqi Li, Kevin Lin, Tianyi Zhou, Lijuan Wang,
Abstract要約: 推論モデルと非推論モデルの間のトークンレベルのミスアライメントを分析する。本稿では,FoReaL-Decodingを提案する。一般的な4つの数学推論ベンチマークにおいて、FoReaL-Decodingは理論FLOPを30から50%減らし、CoTの長さを最大40%減らした。
参考スコア（独自算出の注目度）: 84.42056293290015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large reasoning models (LRMs) achieve strong reasoning performance by emitting long chains of thought. Yet, these verbose traces slow down inference and often drift into unnecessary detail, known as the overthinking phenomenon. To better understand LRMs' behavior, we systematically analyze the token-level misalignment between reasoning and non-reasoning models. While it is expected that their primary difference lies in the stylistic "thinking cues", LRMs uniquely exhibit two pivotal, previously under-explored phenomena: a Global Misalignment Rebound, where their divergence from non-reasoning models persists or even grows as response length increases, and more critically, a Local Misalignment Diminish, where the misalignment concentrates at the "thinking cues" each sentence starts with but rapidly declines in the remaining of the sentence. Motivated by the Local Misalignment Diminish, we propose FoReaL-Decoding, a collaborative fast-slow thinking decoding method for cost-quality trade-off. In FoReaL-Decoding, a Leading model leads the first few tokens for each sentence, and then a weaker draft model completes the following tokens to the end of each sentence. FoReaL-Decoding adopts a stochastic gate to smoothly interpolate between the small and the large model. On four popular math-reasoning benchmarks (AIME24, GPQA-Diamond, MATH500, AMC23), FoReaL-Decoding reduces theoretical FLOPs by 30 to 50% and trims CoT length by up to 40%, while preserving 86 to 100% of model performance. These results establish FoReaL-Decoding as a simple, plug-and-play route to controllable cost-quality trade-offs in reasoning-centric tasks.
Abstract（参考訳）: 大きな推論モデル(LRM)は、長い思考の連鎖を出力することで強い推論性能を達成する。しかし、これらの冗長性は推論を遅くし、しばしば過剰思考現象として知られる不必要な詳細へと漂流する。 LRMの振る舞いをよりよく理解するために、推論モデルと非推論モデルの間のトークンレベルのミスアライメントを体系的に分析する。彼らの主な違いは、スタイル的な「思考の手がかり」にあると期待されているが、LEMは、以前は探索されていなかった2つの重要な現象を独特に示している: グローバルな矛盾(Global Misalignment) リバウンド(Global Misalignment Rebound)は、応答長が増加するにつれて、非調和モデルからの分岐が持続または成長し、より重要なことは、各文の「思考の手がかり」に集中する局所的な矛盾(Local Misalignment Diminish)である。本稿では,FoReaL-Decodingを提案する。FoReaL-Decodingは低コストなトレードオフのための高速な思考デコーディング手法である。 FoReaL-Decodingでは、リードモデルが各文の最初の数個のトークンを導き、その後、より弱いドラフトモデルが各文の最後に次のトークンを完成させる。 FoReaL-Decodingは、小さなモデルと大きなモデルの間をスムーズに補間する確率ゲートを採用している。一般的な4つの数学推論ベンチマーク(AIME24、GPQA-Diamond、MATH500、AMC23)では、FoReaL-Decodingは理論FLOPを30から50%減らし、CoT長を最大40%減らし、モデル性能の86から100%保存する。これらの結果から、FoReaL-Decodingは、推論中心のタスクにおいて、コスト品質のトレードオフを制御可能なシンプルなプラグイン・アンド・プレイの経路として確立される。

論文の概要: What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

関連論文リスト