Fugu-MT 論文翻訳(概要): ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking

論文の概要: ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking

arxiv url: http://arxiv.org/abs/2602.21161v1
Date: Tue, 24 Feb 2026 18:07:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.678957
Title: ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking
Title（参考訳）: Action Reasoning:ロボットブロック積み重ねのためのLLMを用いた3次元空間でのロボットアクション推論
Authors: Guangming Wang, Qizhen Ying, Yixiong Jing, Olaf Wysocki, Brian Sheil,
Abstract要約: ActionReasoningは、ロボット操作のための物理に一貫性のある事前誘導された決定を生成するための明示的なアクション推論を実行するフレームワークである。我々は, この枠組みを, すでに正確な環境状態が測定されていると仮定した, レンガ積み重ねの抽出可能なケーススタディに基づいてインスタンス化する。実験により,提案したマルチエージェント LLM フレームワークは,低レベルドメイン固有コーディングから高レベルツール実行への労力をシフトしながら,安定したブロック配置を可能にすることが示された。
参考スコア（独自算出の注目度）: 7.594306357823438
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Classical robotic systems typically rely on custom planners designed for constrained environments. While effective in restricted settings, these systems lack generalization capabilities, limiting the scalability of embodied AI and general-purpose robots. Recent data-driven Vision-Language-Action (VLA) approaches aim to learn policies from large-scale simulation and real-world data. However, the continuous action space of the physical world significantly exceeds the representational capacity of linguistic tokens, making it unclear if scaling data alone can yield general robotic intelligence. To address this gap, we propose ActionReasoning, an LLM-driven framework that performs explicit action reasoning to produce physics-consistent, prior-guided decisions for robotic manipulation. ActionReasoning leverages the physical priors and real-world knowledge already encoded in Large Language Models (LLMs) and structures them within a multi-agent architecture. We instantiate this framework on a tractable case study of brick stacking, where the environment states are assumed to be already accurately measured. The environmental states are then serialized and passed to a multi-agent LLM framework that generates physics-aware action plans. The experiments demonstrate that the proposed multi-agent LLM framework enables stable brick placement while shifting effort from low-level domain-specific coding to high-level tool invocation and prompting, highlighting its potential for broader generalization. This work introduces a promising approach to bridging perception and execution in robotic manipulation by integrating physical reasoning with LLMs.
Abstract（参考訳）: 古典的なロボットシステムは通常、制約のある環境のために設計されたカスタムプランナーに依存している。制限された設定では有効だが、これらのシステムには一般化機能がなく、組み込みAIと汎用ロボットのスケーラビリティが制限されている。近年のVLA(Vision-Language-Action)アプローチは,大規模シミュレーションと実世界のデータからポリシを学ぶことを目的としている。しかし、物理世界の連続的な行動空間は言語トークンの表現能力を大幅に超えており、データのスケーリングだけで汎用的な知性が得られるかどうかは不明である。このギャップに対処するために,ロボット操作のための物理に一貫性のある事前誘導決定を生成するための明示的なアクション推論を行うLCM駆動のフレームワークであるActionReasoningを提案する。 ActionReasoningは、すでにLLM(Large Language Models)にエンコードされている物理の事前と実世界の知識を活用し、それらをマルチエージェントアーキテクチャ内で構造化する。我々は, この枠組みを, すでに正確な環境状態が測定されていると仮定した, レンガ積み重ねの抽出可能なケーススタディに基づいてインスタンス化する。環境状態はシリアライズされ、物理を意識したアクションプランを生成するマルチエージェントLCMフレームワークに渡される。実験により,提案するマルチエージェント LLM フレームワークは,低レベルドメイン固有コーディングから高レベルツール実行への労力をシフトしながら,安定したブロック配置を可能にし,より広範な一般化の可能性を強調した。本研究は,LLMと物理推論を統合することにより,ロボット操作における知覚と実行をブリッジする,有望なアプローチを導入する。

論文の概要: ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking

関連論文リスト