Fugu-MT 論文翻訳(概要): A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap

論文の概要: A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap

arxiv url: http://arxiv.org/abs/2506.18957v1
Date: Mon, 23 Jun 2025 17:14:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-25 19:48:23.315209
Title: A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap
Title（参考訳）: The Illusion of Thinking: Reframing the Reasoning Cliff as a Agentic Gap
Authors: Sheraz Khan, Subha Madhavan, Kannan Natarajan,
Abstract要約: 我々は、観測された失敗は基本的な認知境界の証拠ではなく、システムレベルの制約の予測可能な結果であると主張している。当初、テキストのみの世代に限定してパズルを宣言することは不可能であると宣言されたモデルは、現在ではエージェントツールを使用して解決するだけでなく、それまで克服できなかった難易度をはるかに超える複雑さを習得している。
参考スコア（独自算出の注目度）: 0.39073867995073247
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The recent work by Shojaee et al. (2025), titled The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, presents a compelling empirical finding, a reasoning cliff, where the performance of Large Reasoning Models (LRMs) collapses beyond a specific complexity threshold, which the authors posit as an intrinsic scaling limitation of Chain-of-Thought (CoT) reasoning. This commentary, while acknowledging the study's methodological rigor, contends that this conclusion is confounded by experimental artifacts. We argue that the observed failure is not evidence of a fundamental cognitive boundary, but rather a predictable outcome of system-level constraints in the static, text-only evaluation paradigm, including tool use restrictions, context window recall issues, the absence of crucial cognitive baselines, inadequate statistical reporting, and output generation limits. We reframe this performance collapse through the lens of an agentic gap, asserting that the models are not failing at reasoning, but at execution within a profoundly restrictive interface. We empirically substantiate this critique by demonstrating a striking reversal. A model, initially declaring a puzzle impossible when confined to text-only generation, now employs agentic tools to not only solve it but also master variations of complexity far beyond the reasoning cliff it previously failed to surmount. Additionally, our empirical analysis of tool-enabled models like o4-mini and GPT-4o reveals a hierarchy of agentic reasoning, from simple procedural execution to complex meta-cognitive self-correction, which has significant implications for how we define and measure machine intelligence. The illusion of thinking attributed to LRMs is less a reasoning deficit and more a consequence of an otherwise capable mind lacking the tools for action.
Abstract（参考訳）: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity”と題されたShojaee et al(2025年)の最近の研究は、大きな推論モデル(LRM)のパフォーマンスが特定の複雑性しきい値を超えて崩壊する、説得力のある経験的発見、推論の崖を提示している。この注釈書は、研究の方法論的厳密さを認めつつも、この結論は実験的な成果物によって立証されていると主張している。我々は、観察された失敗は基本的な認知境界の証拠ではなく、ツールの使用制限、コンテキストウィンドウのリコール問題、重要な認知ベースラインの欠如、統計的報告の不十分、出力生成制限を含む静的なテキストのみの評価パラダイムにおけるシステムレベルの制約の予測可能な結果であると主張している。私たちはエージェントギャップのレンズを通してこのパフォーマンスの崩壊を再現し、モデルが推論時に失敗するのではなく、非常に制限のあるインターフェース内で実行されていることを主張します。私たちは印象的な逆転を示すことによって、この批判を実証的に裏付ける。当初、テキストのみの世代に限定してパズルを宣言することは不可能であると宣言されたモデルは、現在ではエージェントツールを使用して解決するだけでなく、それまで克服できなかった難易度をはるかに超える複雑さを習得している。さらに、o4-miniやGPT-4oのようなツール対応モデルの実証分析により、単純な手続き実行から複雑なメタ認知自己補正に至るまで、エージェント推論の階層構造が明らかとなり、マシンインテリジェンスの定義と測定方法に大きな影響を及ぼす。 LRMに起因した思考の錯覚は、推論不足よりも、行動のための道具を欠いている他の能力のある心の結果である。

関連論文リスト

Lost at the Beginning of Reasoning [82.18834329384514]
第1の推論ステップが最終予測に不当に大きな影響を与えることを示す。本稿では、報酬モデルを利用して高品質な第1推論ステップを特定し、維持する効率的なサンプリング戦略を提案する。モデル自己補正能力を体系的に評価するために、意図的に欠陥のある第1の推論ステップで構築された新しいベンチマークを導入する。
論文参考訳（メタデータ） (2025-06-27T09:53:57Z)
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity [0.0]
大規模推論モデル(LRM)は、特定の複雑性しきい値を超えた計画パズルについて「精度の崩壊」を示す。これらの結果は,基本的推論失敗ではなく,実験的な設計上の制約を主に反映していることが実証された。
論文参考訳（メタデータ） (2025-06-10T21:16:53Z)
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity [16.266145641151375]
大規模な推論モデルは、回答を提供する前に詳細な思考プロセスを生成する。我々は, LRM がある種の複雑さを超えて完全に精度の低下に直面していることを示す。また、より深く推論の痕跡を調べ、探索された解のパターンを研究する。
論文参考訳（メタデータ） (2025-06-07T22:42:29Z)
Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
大きな推論モデル(LRM)は思考の長い連鎖を生成することによって推論能力を著しく向上させた。この性能向上は、生成プロセス中の冗長な推論を大幅に増加させるコストが伴う。本稿では、モデルが独自の推論プロセスを制御することを許容する観点から、過度に検討する新しいフレームワーク、Self-Braking Tuning(SBT)を提案する。
論文参考訳（メタデータ） (2025-05-20T16:53:40Z)
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning [39.613595533503144]
CoT(Chain-of-Thought)プロンプトは、大規模言語モデルにおける推論能力を高める能力として広く認識されている。 CoTは、様々なモデルスケールやベンチマークの複雑さに対して、直接応答を一貫して過小評価していることを示す。パターンベースICLにおけるCoTの性能を駆動する基本的明示的双対性を明らかにする。
論文参考訳（メタデータ） (2025-04-07T13:51:06Z)
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models [46.02816479205161]
我々は,微粒な推論を可能にする認知推論戦略であるtextbfAtomic Reasoner(textbfAR)を提案する。 ARは推論プロセスを原子認知単位に分解し、認知的ルーティング機構を使用する。結果より, 完全解探索の計算負担を伴わないARの優れた推論能力を示す。
論文参考訳（メタデータ） (2025-03-20T08:34:53Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
ベンチマークはさまざまなバイアス、アーティファクト、リークに悩まされている。モデルは、調査の不十分な障害モードのため、信頼できない振る舞いをする可能性がある。因果関係はこれらの課題を体系的に解決するための理想的な枠組みを提供します
論文参考訳（メタデータ） (2025-02-07T17:01:37Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
論理的推論の文脈において,大規模言語モデル(LLM)の自己検証能力について詳しく検討する。本研究の主目的は,既存のLCMが誤った推論手順を正確に識別するのに苦労し,自己検証法の有効性を保証できないことにある。
論文参考訳（メタデータ） (2023-11-14T07:13:10Z)
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL [86.0987896274354]
まず、オフラインRLにおけるQ値推定のばらつきの主な原因として、基本パターン、自己励起を同定する。そこで本研究では,Q-network の学習における進化特性を測定するために,SEEM(Self-Excite Eigen Value Measure)尺度を提案する。われわれの理論では、訓練が早期に発散するかどうかを確実に決定できる。
論文参考訳（メタデータ） (2023-10-06T17:57:44Z)
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs [55.66353783572259]
Causal-Consistency Chain-of-Thoughtは、基礎モデルの忠実さと因果性を強化するために、マルチエージェントコラボレーションを活用する。我々のフレームワークは、広範囲かつ包括的な評価を通じて、最先端の手法よりも大きな優位性を示す。
論文参考訳（メタデータ） (2023-08-23T04:59:21Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。