Fugu-MT 論文翻訳(概要): The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

論文の概要: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

arxiv url: http://arxiv.org/abs/2506.06941v1
Date: Sat, 07 Jun 2025 22:42:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-10 16:33:10.57653
Title: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Title（参考訳）: 思考のイラシオン:問題複雑度レンズによる推論モデルの強度と限界を理解する
Authors: Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar,
Abstract要約: 大規模な推論モデルは、回答を提供する前に詳細な思考プロセスを生成する。我々は, LRM がある種の複雑さを超えて完全に精度の低下に直面していることを示す。また、より深く推論の痕跡を調べ、探索された解のパターンを研究する。
参考スコア（独自算出の注目度）: 16.266145641151375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established math and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from contamination and does not provide insights into the reasoning traces. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs think. Through extensive experiments, we show that LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having remaining token budget. By comparing LRMs with their standard LLM counterparts under same inference compute, we identify three performance regimes: (1) low-complexity tasks where standard models outperform LRMs, (2) medium-complexity tasks where LRMs demonstrates advantage, and (3) high-complexity tasks where both models face complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across scales. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models' computational behavior, shedding light on their strengths, limitations, and raising questions about their reasoning capabilities.
Abstract（参考訳）: 近年の言語モデルでは、回答を提供する前に詳細な思考プロセスを生成するLarge Reasoning Models (LRM)が導入されている。これらのモデルは推論ベンチマークのパフォーマンス向上を示すが、その基本的な機能、スケーリング特性、制限は十分に理解されていない。現在の評価は主に確立された数学とコーディングのベンチマークに焦点を当て、最終回答の精度を強調している。しかし、この評価パラダイムは、しばしば汚染に悩まされ、推論の痕跡に関する洞察を与えない。本研究では,一貫した論理構造を維持しつつ,複雑性の正確な操作を可能にする制御可能なパズル環境の助けを借りて,これらのギャップを体系的に検討する。このセットアップにより、最終回答だけでなく、内部の推論トレースの分析が可能になり、LEMの考え方に関する洞察を提供する。広範囲にわたる実験により, LRM はある種の複雑度を超えた完全精度の崩壊に直面していることがわかった。さらに、彼らは直感的なスケーリングの限界を示します。彼らの推論の努力は、問題の複雑さによって1ポイントまで増加し、トークンの予算が残っているにもかかわらず減少します。 1)標準モデルがLRMより優れている低複雑さタスク、(2)LRMが有利である中複雑タスク、(3)両モデルが完全に崩壊する高複雑さタスクである。 LRMは明示的なアルゴリズムを使わず、スケールをまたいで不整合に推論するので、正確な計算に制限があることがわかった。また、より深く推論の痕跡を調べ、探索された解のパターンを調査し、モデルの計算的振る舞いを分析し、その強さや限界に光を当て、推論能力に関する疑問を提起する。

論文の概要: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

関連論文リスト