Fugu-MT 論文翻訳(概要): Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting

論文の概要: Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting

arxiv url: http://arxiv.org/abs/2509.26522v1
Date: Tue, 30 Sep 2025 16:59:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.218283
Title: Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Title（参考訳）: Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Authors: Xi Wang, James McInerney, Lequn Wang, Nathan Kallus,
Abstract要約: 正しい解に到達した後も、回答を再検討し続けながら、大きな推論モデルが過大評価されていることを示す。 We propose Entropy After /Think> (EAT) for monitoring and decide whether to exit reasoning early。 EATは、正確性を損なうことなく、トークン使用量を13～21%削減する。
参考スコア（独自算出の注目度）: 38.93424884988798
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large reasoning models show improved performance with longer chains of thought. However, recent work has highlighted (qualitatively) their tendency to overthink, continuing to revise answers even after reaching the correct solution. We quantitatively confirm this inefficiency by tracking Pass@1 for answers averaged over a large number of rollouts and find that the model often begins to always produce the correct answer early in the reasoning, making extra reasoning a waste of tokens. To detect and prevent overthinking, we propose a simple and inexpensive novel signal -- Entropy After </Think> (EAT) -- for monitoring and deciding whether to exit reasoning early. By appending a stop thinking token (</think>) and monitoring the entropy of the following token as the model reasons, we obtain a trajectory that decreases and stabilizes when Pass@1 plateaus; thresholding its variance under an exponential moving average yields a practical stopping rule. Importantly, our approach enables adaptively allocating compute based on the EAT trajectory, allowing us to spend compute in a more efficient way compared with fixing the token budget for all questions. Empirically, on MATH500 and AIME2025, EAT reduces token usage by 13 - 21% without harming accuracy, and it remains effective in black box settings where logits from the reasoning model are not accessible, and EAT is computed with proxy models.
Abstract（参考訳）: 大きな推論モデルでは、より長い思考の連鎖でパフォーマンスが向上している。しかし、最近の研究は、正しい解に到達した後も答えを改訂し続けながら、過度に考え過ぎている傾向を強調している(例外的に)。多数のロールアウトで平均された回答に対してPass@1を追跡することで、この非効率性を定量的に確認する。過度な思考を検知し,防止するために,早期に推論を終了するか否かを監視・判断するための簡易で安価な新しい信号であるエントロピー・アフター (EAT) を提案する。モデル理由として,ストップシンキングトークン(</think>)を付加し,次のトークンのエントロピーをモニタリングすることにより,Pass@1高原で減少・安定化する軌道を得る。重要なことは、EAT軌道に基づく計算を適応的に割り当てることを可能にし、全ての質問に対するトークン予算の修正よりも効率的な方法で計算を使用できることである。実証的には、MATH500とAIME2025では、EATは正確性を損なうことなくトークン使用量を13～21%削減し、推論モデルからのロジットにアクセスできないブラックボックス設定でも有効であり、EATはプロキシモデルで計算される。

関連論文リスト

Entropy-Guided Loop: Achieving Reasoning through Uncertainty-Aware Generation [0.0]
entropy-guided refinementは、トークンレベルの不確実性を使用して、1つのターゲットのリファインメントパスをトリガーする軽量なテスト時間ループである。この不確実性認識ループは,シングルパス推論と高価な推論チェーンの中間点として有効であることを示す。
論文参考訳（メタデータ） (2025-08-26T22:29:12Z)
Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit [114.83867400179354]
オーバーライドは、大きな言語モデル全体のパフォーマンスを低下させる可能性がある。推論は, 探索段階の不足, 補償推論段階, 推論収束段階の3段階に分類される。我々は,ルールに基づく軽量なしきい値設定戦略を開発し,推論精度を向上させる。
論文参考訳（メタデータ） (2025-08-25T03:17:17Z)
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model [7.8354921036790275]
大きな推論モデル(LRM)は複雑な問題を解決するのに優れているが、過度なジレンマに直面している。単純なタスクを扱う場合、思考トークンがオーバーロードされた冗長なレスポンスを生成することが多い。これらのトークンは、リフレクションやバックトラックのような不要な高レベルの推論動作を引き起こし、効率を低下させる。
論文参考訳（メタデータ） (2025-06-30T13:30:33Z)
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency [24.56015832583054]
高度な推論には、"Wait" や "Hmm" などのトークンによって信号される明示的な自己回帰が必要である。推論中にこれらのトークンを抑えることで、明示的な自己回帰を無効にする、シンプルで効果的なアプローチであるNoWaitを提案する。
論文参考訳（メタデータ） (2025-06-10T01:54:04Z)
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [56.40065909544213]
大規模言語モデル(LLM)は、テスト時間スケーリングとして知られる、テスト時間計算の増加の恩恵を受ける。しかし、推論最適化モデルはしばしば単純な問題さえ考え過ぎ、過度に冗長な出力を生成し、トークン効率を低下させる。 1)強化学習は前方推論の情報密度を減少させ,(2)後方連鎖学習は冗長でしばしば不要な検証ステップを促進する。
論文参考訳（メタデータ） (2025-05-28T06:24:45Z)
VeriThinker: Learning to Verify Makes Reasoning Model Efficient [52.74493506816969]
大型推論モデルは、Chain-of-Thought (CoT)推論を用いて複雑なタスクで優れている。過度に考える傾向は、必然的に長い推論連鎖に繋がる。我々は,CoT圧縮の新しい手法であるVeriThinkerを紹介する。
論文参考訳（メタデータ） (2025-05-23T14:17:56Z)
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models [30.184895117009457]
本稿では,問題の難易度に基づいて,モデルが自律的にChain-of-Thought(CoT)の長さを調整できる,DAST(Difficulty-Adaptive Slow Thinking)を提案する。多様なデータセットとモデルスケールの実験により、DASTは複雑な問題に対する推論精度を維持しながら、過剰思考を効果的に軽減することを示した。
論文参考訳（メタデータ） (2025-03-06T14:23:06Z)
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [98.3430004984531]
精度を維持しながら推論オーバーヘッドを最小限に抑えるため,Longth-Harmonizing Fine-Tuning (O1-Pruner)を提案する。私たちのコードはもうすぐhttps://github.com/StarDewXXX/O1-Pruner.comで公開されます。
論文参考訳（メタデータ） (2025-01-22T01:35:11Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。