Fugu-MT 論文翻訳(概要): Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring

論文の概要: Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring

arxiv url: http://arxiv.org/abs/2603.14251v1
Date: Sun, 15 Mar 2026 07:00:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.70051
Title: Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
Title（参考訳）: 共振経路偏差モニタリングによる大規模推論言語モデルの再考
Authors: Weixin Guan, Liang Li, Jiapeng Liu, Bing Li, Peng Fu, Chengyang Fang, Xiaoshuai Hao, Can Ma, Weiping Wang,
Abstract要約: LRLMにおける過剰思考を緩和する早期退避法を提案する。本手法は,既存手法に比べてバニラCoTよりも高い性能向上を実現している。
参考スコア（独自算出の注目度）: 35.58177960646011
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Reasoning Language Models (LRLMs) demonstrate impressive capabilities on complex tasks by utilizing long Chain-of-Thought reasoning. However, they are prone to overthinking, which generates redundant reasoning steps that degrade both performance and efficiency. Recently, early-exit strategies are proposed to mitigate overthinking by dynamically and adaptively terminating redundant reasoning. However, current early-exit methods either introduce extra training overhead by relying on proxy models or limit inference throughput due to the frequent content switching between reasoning and generating probing answers. Moreover, most early-exit methods harm LRLMs performance due to over-truncation. Our insight stems from an observation: overthinking often causes LRLMs to deviate from the correct reasoning path, which is frequently accompanied by high-entropy transition tokens. Given this, we propose an early-exit method deeply coupled with the native reasoning process, which leverages the path deviation index as a dedicated monitoring metric for the frequent occurrence of high-entropy transition tokens to dynamically detect and terminate overthinking trajectories. We conduct experiments across multiple benchmarks using LRLMs of different types and scales, and the results indicate that our method delivers the largest performance improvement over vanilla CoT compared to existing early-exit methods.
Abstract（参考訳）: LRLM(Large Reasoning Language Models)は、長いチェーン・オブ・ソート推論を利用して複雑なタスクに印象的な機能を示す。しかし、彼らは過度に考える傾向にあり、パフォーマンスと効率の両方を低下させる冗長な推論ステップを生成します。近年,冗長推論を動的かつ適応的に終了させることで過度な思考を緩和する早期退避戦略が提案されている。しかしながら、現在のアーリーエグジットメソッドは、プロキシモデルに依存するか、推論結果の生成と推論の間の頻繁なコンテンツ切替による推論スループットを制限することで、追加のトレーニングオーバーヘッドを導入する。さらに、ほとんどの早期排他法は、過剰停止によるLRLMの性能を損なう。我々の洞察は、過剰に考えるとLRLMは正しい推論経路から逸脱し、しばしば高エントロピー遷移トークンが伴う。そこで本研究では,高エントロピートランジショントークンの頻繁な発生に対して,経路偏差指数を専用監視指標として活用し,過度な軌跡を動的に検出・終了する手法を提案する。我々は,異なるタイプのLRLMを用いて,複数のベンチマークで実験を行い,本手法は既存の早期出力法と比較して,バニラCoTよりも最大の性能向上をもたらすことを示した。

関連論文リスト

Reinforced Efficient Reasoning via Semantically Diverse Exploration [73.41112984160992]
検証可能な報酬(RLVR)による強化学習は,大規模言語モデル(LLM)の推論の強化に有効であることが証明された。本研究では,LLMのための意味的多様性探索,すなわちROSEによる効率的な推論手法を提案する。本手法は,意味エントロピーに基づく分岐戦略と$varepsilon$-exploration機構を組み込んだものである。
論文参考訳（メタデータ） (2026-01-08T15:56:44Z)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization [56.59356959631999]
Gated Perception-Reasoning Optimization (GPRO) は3つの決定経路間で動的に計算をルーティングするメタ推論コントローラである。 GPROは精度と効率を大幅に改善し、最近のスロー思考法よりも優れている。
論文参考訳（メタデータ） (2026-01-07T23:05:17Z)
Directional Reasoning Injection for Fine-Tuning MLLMs [51.53222423215055]
マルチモーダルな大言語モデル(MLLM)は急速に進歩しているが、その推論能力は強いテキストのみのモデルよりも遅れていることが多い。このギャップを埋める既存の方法は、大規模マルチモーダル推論データや強化学習の監督された微調整に依存している。この問題を解決するために,DRIFT(Directional Reasoning Injection for Fine-Tuning)を提案する。
論文参考訳（メタデータ） (2025-10-16T18:06:46Z)
Fast Thinking for Large Language Models [67.7238685892317]
我々は、訓練中にのみ簡潔なCoTスケッチを使用して個別戦略事前のコードブックを学習するフレームワークであるLatent Codebooks for Fast Thinkingを紹介した。推論では、コードブックから抽出した少数の連続的思考スイッチのモデル条件を1パスにすることで、明確な推論トークンを生成することなく、戦略レベルのガイダンスを可能にする。
論文参考訳（メタデータ） (2025-09-28T04:19:48Z)
Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit [114.83867400179354]
オーバーライドは、大きな言語モデル全体のパフォーマンスを低下させる可能性がある。推論は, 探索段階の不足, 補償推論段階, 推論収束段階の3段階に分類される。我々は,ルールに基づく軽量なしきい値設定戦略を開発し,推論精度を向上させる。
論文参考訳（メタデータ） (2025-08-25T03:17:17Z)
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models [35.82665698868508]
大規模言語モデル(LLM)は、推論時間の間に高い計算時間とエラーの伝播に苦労する。提案するMeta-Reasonerは,LLMが推論時間における推論方法の戦略を調整することで,推論計算を最適化するための新しいフレームワークである。提案手法は,従来のSOTA法よりも9～12%向上し,推論時間を28～35%短縮する。
論文参考訳（メタデータ） (2025-02-27T09:40:13Z)
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
本稿では,直接推論 (DR) と間接推論 (IR) を並列な複数の推論経路として考慮し,最終解を導出する直接間接推論 (DIR) 手法を提案する。我々のDIR法は単純だが有効であり、既存のCoT法と簡単に統合できる。
論文参考訳（メタデータ） (2024-02-06T03:41:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。