Fugu-MT 論文翻訳(概要): Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

論文の概要: Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

arxiv url: http://arxiv.org/abs/2508.14564v1
Date: Wed, 20 Aug 2025 09:36:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-21 16:52:41.415317
Title: Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
Title（参考訳）: LLMにおける審美的推論のための構造的思考-行動シーケンス
Authors: Luca Annese, Sabrina Patania, Silvia Serino, Tom Foulsham, Silvia Rossi, Azzurra Ruggeri, Dimitri Ognibene,
Abstract要約: 本研究では,LLMをベースとしたReActフレームワークの性能向上のための構造化例の可能性について検討した。本稿では、最適ゴールパス(G型)、情報ノードパス(E型)、ステップバイステップの最適決定シーケンス(L型)の3つのカテゴリの例を生成する、構造化された解処理パイプラインを提案する。 L型の例は、明確化要求と全体的なアクションステップをわずかに削減するが、一貫性のある改善は得られない。
参考スコア（独自算出の注目度）: 1.090218572228214
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advances in large language models (LLMs) and reasoning frameworks have opened new possibilities for improving the perspective -taking capabilities of autonomous agents. However, tasks that involve active perception, collaborative reasoning, and perspective taking (understanding what another agent can see or knows) pose persistent challenges for current LLM-based systems. This study investigates the potential of structured examples derived from transformed solution graphs generated by the Fast Downward planner to improve the performance of LLM-based agents within a ReAct framework. We propose a structured solution-processing pipeline that generates three distinct categories of examples: optimal goal paths (G-type), informative node paths (E-type), and step-by-step optimal decision sequences contrasting alternative actions (L-type). These solutions are further converted into ``thought-action'' examples by prompting an LLM to explicitly articulate the reasoning behind each decision. While L-type examples slightly reduce clarification requests and overall action steps, they do not yield consistent improvements. Agents are successful in tasks requiring basic attentional filtering but struggle in scenarios that required mentalising about occluded spaces or weighing the costs of epistemic actions. These findings suggest that structured examples alone are insufficient for robust perspective-taking, underscoring the need for explicit belief tracking, cost modelling, and richer environments to enable socially grounded collaboration in LLM-based agents.
Abstract（参考訳）: 大規模言語モデル(LLM)や推論フレームワークの最近の進歩は、自律エージェントの視点獲得能力を改善する新たな可能性を開いた。しかしながら、アクティブな認識、協調的推論、視点の取扱い(他のエージェントが何を見たり、知っているかに関わらず)を含むタスクは、現在のLLMベースのシステムに永続的な課題を生じさせる。本研究では,Fast Downward Planner が生成した変換解グラフから導出した構造化例の可能性について検討した。本稿では、最適ゴールパス(G型)、情報ノードパス(E型)、代替アクション(L型)とは対照的なステップバイステップの最適決定シーケンスという、3つの異なるカテゴリの例を生成する構造化された解処理パイプラインを提案する。これらの解は、LLMに各決定の背景にある推論を明確に明示するように促すことにより、さらに「考え-行動」の例に変換される。 L型の例は、明確化要求と全体的なアクションステップをわずかに削減するが、一貫性のある改善は得られない。エージェントは基本的な注意フィルタリングを必要とするタスクで成功しているが、隠された空間を意識したり、てんかん行為のコストを測る必要のあるシナリオで苦労する。これらの結果は、構造化された例だけでは、堅固な視点を取るには不十分であることが示唆され、LLMベースのエージェントで社会的に基盤付けられたコラボレーションを可能にするために、明確な信念追跡、コストモデリング、より豊かな環境の必要性が強調された。

論文の概要: Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

関連論文リスト