Fugu-MT 論文翻訳(概要): On the Limits of Innate Planning in Large Language Models

論文の概要: On the Limits of Innate Planning in Large Language Models

arxiv url: http://arxiv.org/abs/2511.21591v1
Date: Wed, 26 Nov 2025 17:08:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-27 18:37:59.213733
Title: On the Limits of Innate Planning in Large Language Models
Title（参考訳）: 大規模言語モデルにおける自然計画の限界について
Authors: Charles Schepanowski, Charles Ling,
Abstract要約: 大規模言語モデル(LLM)は多くのベンチマークで印象的な結果を得るが、計画とステートフルな推論の能力は未だに不明である。コード実行や他のツールを使わずに、これらの能力を直接研究し、8-puzzleというステートトラッキングとゴール指向の計画を必要とする古典的なタスクを使います。
参考スコア（独自算出の注目度）: 13.604285158704466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) achieve impressive results on many benchmarks, yet their capacity for planning and stateful reasoning remains unclear. We study these abilities directly, without code execution or other tools, using the 8-puzzle: a classic task that requires state tracking and goal-directed planning while allowing precise, step-by-step evaluation. Four models are tested under common prompting conditions (Zero-Shot, Chain-of-Thought, Algorithm-of-Thought) and with tiered corrective feedback. Feedback improves success rates for some model-prompt combinations, but many successful runs are long, computationally expensive, and indirect. We then examine the models with an external move validator that provides only valid moves. Despite this level of assistance, none of the models solve any puzzles in this setting. Qualitative analysis reveals two dominant deficits across all models: (1) brittle internal state representations, leading to frequent invalid moves, and (2) weak heuristic planning, with models entering loops or selecting actions that do not reduce the distance to the goal state. These findings indicate that, in the absence of external tools such as code interpreters, current LLMs have substantial limitations in planning and that further progress may require mechanisms for maintaining explicit state and performing structured search.
Abstract（参考訳）: 大規模言語モデル(LLM)は多くのベンチマークで印象的な結果を得るが、計画とステートフルな推論の能力は未だに不明である。コード実行や他のツールを使わずに、これらの能力を直接研究する。8-puzzleは、状態追跡とゴール指向の計画を必要とする古典的なタスクであり、正確なステップバイステップの評価を可能にする。一般的なプロンプト条件 (Zero-Shot, Chain-of-Thought, Algorithm-of-Thought) の下で4つのモデルがテストされ, 相関した修正フィードバックが得られた。フィードバックはいくつかのモデルとプロンプトの組み合わせの成功率を改善するが、多くの成功した実行は長く、計算コストが高く、間接的である。次に、有効な動作のみを提供する外部移動検証器を用いてモデルを検証する。このレベルの支援にもかかわらず、どのモデルもこの設定でどのパズルも解けない。定性的分析は、(1)内部状態表現の脆さ、頻繁な不正な動きにつながること、(2)モデルがループに入るか、目標状態までの距離を減らさないアクションを選択するという、弱いヒューリスティック計画の2つの主要な欠点を明らかにしている。これらの結果から,コードインタプリタなどの外部ツールがなければ,現在のLLMは計画にかなりの制限があり,さらに進行には明示的な状態を維持し,構造化された検索を行うメカニズムが必要である可能性が示唆された。

論文の概要: On the Limits of Innate Planning in Large Language Models

関連論文リスト