Fugu-MT 論文翻訳(概要): Architecting Large Action Models for Human-in-the-Loop Intelligent Robots

論文の概要: Architecting Large Action Models for Human-in-the-Loop Intelligent Robots

arxiv url: http://arxiv.org/abs/2512.11620v1
Date: Fri, 12 Dec 2025 14:58:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.304539
Title: Architecting Large Action Models for Human-in-the-Loop Intelligent Robots
Title（参考訳）: 対人知能ロボットのための大規模行動モデルの構築
Authors: Kanisorn Sangchai, Methasit Boonpun, Withawin Kraipetchara, Paulo Garcia,
Abstract要約: 既成の基盤モデルを構成することで,優れた大規模行動モデルを構築することができることを示す。マルチモーダルロボットを用いた実験により,大規模行動モデルインテリジェンスには大規模なエンドツーエンドトレーニングが不要であることが実証された。
参考スコア（独自算出の注目度）: 0.6999740786886536
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The realization of intelligent robots, operating autonomously and interacting with other intelligent agents, human or artificial, requires the integration of environment perception, reasoning, and action. Classic Artificial Intelligence techniques for this purpose, focusing on symbolic approaches, have long-ago hit the scalability wall on compute and memory costs. Advances in Large Language Models in the past decade (neural approaches) have resulted in unprecedented displays of capability, at the cost of control, explainability, and interpretability. Large Action Models aim at extending Large Language Models to encompass the full perception, reasoning, and action cycle; however, they typically require substantially more comprehensive training and suffer from the same deficiencies in reliability. Here, we show it is possible to build competent Large Action Models by composing off-the-shelf foundation models, and that their control, interpretability, and explainability can be effected by incorporating symbolic wrappers and associated verification on their outputs, achieving verifiable neuro-symbolic solutions for intelligent robots. Our experiments on a multi-modal robot demonstrate that Large Action Model intelligence does not require massive end-to-end training, but can be achieved by integrating efficient perception models with a logic-driven core. We find that driving action execution through the generation of Planning Domain Definition Language (PDDL) code enables a human-in-the-loop verification stage that effectively mitigates action hallucinations. These results can support practitioners in the design and development of robotic Large Action Models across novel industries, and shed light on the ongoing challenges that must be addressed to ensure safety in the field.
Abstract（参考訳）: インテリジェントロボットの実現、自律的に動作し、人間または人工的な他のインテリジェントエージェントと相互作用するには、環境認識、推論、行動の統合が必要である。この目的のための古典的な人工知能技術は、シンボリックアプローチに重点を置いており、コンピューティングとメモリコストのスケーラビリティの壁を長く押し付けている。過去10年間の大規模言語モデルの進歩(神経的アプローチ)は、制御、説明可能性、解釈可能性のコストで、前例のない能力の表示をもたらした。大きなアクションモデルは、大きな言語モデルを拡張して、完全な認識、推論、行動サイクルを包含することを目指しているが、一般的にはより包括的なトレーニングを必要とし、信頼性の同じ欠陥に悩まされる。そこで,本研究では,既成の基盤モデルを構成することによって,有能な大規模行動モデルを構築することが可能であり,その制御,解釈可能性,説明性は,シンボルラッパーと関連する検証をその出力に組み込むことで,知能ロボットのための検証可能なニューロシンボリック・ソリューションを実現することによって実現可能であることを示す。マルチモーダルロボットを用いた実験では、大規模行動モデルインテリジェンスには大規模なエンドツーエンドトレーニングは必要ないが、論理駆動コアに効率的な知覚モデルを統合することで実現可能であることが示された。計画ドメイン定義言語(PDDL)コード生成による動作実行により,行動幻覚を効果的に緩和するループ内検証段階が実現できることが判明した。これらの結果は、新産業におけるロボット大アクションモデルの設計と開発における実践者を支援し、現場の安全を確保するために対処しなければならない課題に光を当てることができる。

論文の概要: Architecting Large Action Models for Human-in-the-Loop Intelligent Robots

関連論文リスト