Fugu-MT 論文翻訳(概要): From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

論文の概要: From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

arxiv url: http://arxiv.org/abs/2603.03148v1
Date: Tue, 03 Mar 2026 16:36:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.865901
Title: From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?
Title（参考訳）: 言語から行動へ:LLMをベースとしたエージェントはロボット認知に使えるか?
Authors: Shinas Shaji, Fabian Huppertz, Alex Mitrevski, Sebastian Houben,
Abstract要約: 大規模言語モデル(LLM)は、推論や言語理解など、突発的な認知的側面を示すことが示されている。本稿では,エージェントLLMが計画と推論のコアコンポーネントとなる認知アーキテクチャを提案する。本研究では,エージェントの推論,計画,記憶利用の2つの課題について,提案システムの評価を行った。
参考スコア（独自算出の注目度）: 1.4016147623265656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In order to flexibly act in an everyday environment, a robotic agent needs a variety of cognitive capabilities that enable it to reason about plans and perform execution recovery. Large language models (LLMs) have been shown to demonstrate emergent cognitive aspects, such as reasoning and language understanding; however, the ability to control embodied robotic agents requires reliably bridging high-level language to low-level functionalities for perception and control. In this paper, we investigate the extent to which an LLM can serve as a core component for planning and execution reasoning in a cognitive robot architecture. For this purpose, we propose a cognitive architecture in which an agentic LLM serves as the core component for planning and reasoning, while components for working and episodic memories support learning from experience and adaptation. An instance of the architecture is then used to control a mobile manipulator in a simulated household environment, where environment interaction is done through a set of high-level tools for perception, reasoning, navigation, grasping, and placement, all of which are made available to the LLM-based agent. We evaluate our proposed system on two household tasks (object placement and object swapping), which evaluate the agent's reasoning, planning, and memory utilisation. The results demonstrate that the LLM-driven agent can complete structured tasks and exhibits emergent adaptation and memory-guided planning, but also reveal significant limitations, such as hallucinations about the task success and poor instruction following by refusing to acknowledge and complete sequential tasks. These findings highlight both the potential and challenges of employing LLMs as embodied cognitive controllers for autonomous robots.
Abstract（参考訳）: 日常の環境で柔軟に行動するためには、ロボットエージェントは様々な認知能力を必要としている。大規模言語モデル(LLM)は、推論や言語理解などの創発的な認知的側面を示すことが示されているが、具体的ロボットエージェントを制御する能力は、知覚と制御の低レベル機能に確実に高レベル言語をブリッジする必要がある。本稿では,LLMが認知ロボットアーキテクチャにおける計画および実行推論のコアコンポーネントとして機能する範囲について検討する。本研究では,エージェントLLMが計画と推論のコアコンポーネントとして機能する認知アーキテクチャを提案する。このアーキテクチャの例は、シミュレーションされた家庭環境における移動マニピュレータを制御するために使用され、環境相互作用は、認識、推論、ナビゲーション、把握、配置の一連の高レベルなツールを通して行われる。提案システムは,エージェントの推論,計画,メモリ利用を評価する2つの家庭用タスク(オブジェクト配置とオブジェクトスワップ)で評価する。以上の結果から,LCM駆動エージェントは構造化タスクを完了し,創発的適応とメモリ誘導計画を示すだけでなく,タスク成功に対する幻覚や,シーケンシャルタスクの承認や完了を拒むことによる指示不足など,重大な制約も明らかにした。これらの知見は、自律ロボットのための認知コントローラとしてLLMを採用する可能性と課題の両方を浮き彫りにした。

論文の概要: From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

関連論文リスト