Fugu-MT 論文翻訳(概要): Plan Verification for LLM-Based Embodied Task Completion Agents

論文の概要: Plan Verification for LLM-Based Embodied Task Completion Agents

arxiv url: http://arxiv.org/abs/2509.02761v2
Date: Thu, 04 Sep 2025 15:30:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 14:03:59.174647
Title: Plan Verification for LLM-Based Embodied Task Completion Agents
Title（参考訳）: LLMを用いたタスク完了エージェントの計画検証
Authors: Ananth Hariharan, Vardhan Dongre, Dilek Hakkani-Tür, Gokhan Tur,
Abstract要約: 大規模言語モデル(LLM)に基づくタスク計画とそれに対応するAIの人間による実演は騒々しいかもしれない。審査員が行動系列を批判し、プランナーLLMが修正を適用する反復検証フレームワークを提案する。
参考スコア（独自算出の注目度）: 10.439882851477162
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) based task plans and corresponding human demonstrations for embodied AI may be noisy, with unnecessary actions, redundant navigation, and logical errors that reduce policy quality. We propose an iterative verification framework in which a Judge LLM critiques action sequences and a Planner LLM applies the revisions, yielding progressively cleaner and more spatially coherent trajectories. Unlike rule-based approaches, our method relies on natural language prompting, enabling broad generalization across error types including irrelevant actions, contradictions, and missing steps. On a set of manually annotated actions from the TEACh embodied AI dataset, our framework achieves up to 90% recall and 100% precision across four state-of-the-art LLMs (GPT o4-mini, DeepSeek-R1, Gemini 2.5, LLaMA 4 Scout). The refinement loop converges quickly, with 96.5% of sequences requiring at most three iterations, while improving both temporal efficiency and spatial action organization. Crucially, the method preserves human error-recovery patterns rather than collapsing them, supporting future work on robust corrective behavior. By establishing plan verification as a reliable LLM capability for spatial planning and action refinement, we provide a scalable path to higher-quality training data for imitation learning in embodied AI.
Abstract（参考訳）: 大規模言語モデル(LLM)ベースのタスク計画とそれに対応するAIの人間によるデモンストレーションは、不要なアクション、冗長なナビゲーション、ポリシーの品質を低下させる論理的エラーなど、ノイズを伴う可能性がある。本稿では,LLM判事が行動系列を批判し,Planner LLMが修正を適用した反復検証フレームワークを提案する。ルールベースのアプローチとは異なり、本手法は自然言語のプロンプトに依存しており、無関係な動作、矛盾、欠落したステップを含むエラータイプを広範囲に一般化することができる。 TEAChを具体化したAIデータセットから手動で注釈付けされた一連のアクションに基づいて、我々のフレームワークは、4つの最先端LLM(GPT o4-mini、DeepSeek-R1、Gemini 2.5、LLaMA 4 Scout)で最大90%のリコールと100%の精度を達成する。精製ループは急速に収束し、96.5%のシーケンスは少なくとも3回の繰り返しを必要とするが、時間効率と空間行動の両方を改善する。重要なことに、この方法は、崩壊するよりも、人間のエラー回復パターンを保存し、堅牢な修正行動に関する将来の研究を支援する。空間計画と行動改善のための信頼性の高いLCM機能として計画検証を確立することにより、具体的AIにおける模倣学習のための高品質なトレーニングデータへのスケーラブルなパスを提供する。

論文の概要: Plan Verification for LLM-Based Embodied Task Completion Agents

関連論文リスト