Fugu-MT 論文翻訳(概要): Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation

論文の概要: Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation

arxiv url: http://arxiv.org/abs/2503.08084v1
Date: Tue, 11 Mar 2025 06:37:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-03-12 22:35:51.590971
Title: Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation
Title（参考訳）: 教示強化長軸計画:身体移動操作における接地機構の埋め込み
Authors: Fangyuan Wang, Shipeng Lyu, Peng Zhou, Anqing Duan, Guodong Guo, David Navarro-Alarcon,
Abstract要約: Instruction-Augmented Long-Horizon Planning (IALP) システムを提案する。その結果, IALPシステムでは, 平均成功率80%を超えるタスクを効率的に解けることがわかった。
参考スコア（独自算出の注目度）: 39.43049944895508
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enabling humanoid robots to perform long-horizon mobile manipulation planning in real-world environments based on embodied perception and comprehension abilities has been a longstanding challenge. With the recent rise of large language models (LLMs), there has been a notable increase in the development of LLM-based planners. These approaches either utilize human-provided textual representations of the real world or heavily depend on prompt engineering to extract such representations, lacking the capability to quantitatively understand the environment, such as determining the feasibility of manipulating objects. To address these limitations, we present the Instruction-Augmented Long-Horizon Planning (IALP) system, a novel framework that employs LLMs to generate feasible and optimal actions based on real-time sensor feedback, including grounded knowledge of the environment, in a closed-loop interaction. Distinct from prior works, our approach augments user instructions into PDDL problems by leveraging both the abstract reasoning capabilities of LLMs and grounding mechanisms. By conducting various real-world long-horizon tasks, each consisting of seven distinct manipulatory skills, our results demonstrate that the IALP system can efficiently solve these tasks with an average success rate exceeding 80%. Our proposed method can operate as a high-level planner, equipping robots with substantial autonomy in unstructured environments through the utilization of multi-modal sensor inputs.
Abstract（参考訳）: 具体的知覚と理解能力に基づいて、現実の環境で長時間の移動操作計画を実行するためのヒューマノイドロボットの開発は、長年の課題であった。近年の大規模言語モデル(LLM)の台頭により、LLMベースのプランナの開発が顕著に増加した。これらのアプローチは、現実世界の人為的なテキスト表現を利用するか、あるいはそのような表現を抽出するためのプロンプトエンジニアリングに強く依存しているかのいずれかであり、物体を操作する可能性を決定するなど、環境を定量的に理解する能力が欠如している。これらの制約に対処するために,LLMを用いた新しいフレームワークであるInstruction-Augmented Long-Horizon Planning (IALP)システムを提案する。従来の手法と異なり,本手法では,LLMの抽象的推論能力とグラウンドリング機構の両方を活用することにより,PDDL問題へのユーザ指示を強化する。実世界の長期タスクを7つの異なるマニピュレータ技術で行うことで, IALPシステムにより, 平均成功率80%を超えるタスクを効率的に解けることを示す。提案手法は高レベルプランナとして機能し,マルチモーダルセンサ入力を利用することで,非構造環境におけるロボットの自律性を高める。

関連論文リスト

Dynamic Path Navigation for Motion Agents with LLM Reasoning [69.5875073447454]
大規模言語モデル(LLM)は、強力な一般化可能な推論と計画能力を示している。本研究では,LLMのゼロショットナビゲーションと経路生成機能について,データセットの構築と評価プロトコルの提案により検討する。このようなタスクが適切に構成されている場合、現代のLCMは、目標に到達するために生成された動きでナビゲーションを自律的に精錬しながら障害を回避するためのかなりの計画能力を示す。
論文参考訳（メタデータ） (2025-03-10T13:39:09Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
大規模言語モデル(LLM)は、様々なタスクにまたがる顕著な機能を示している。しかし、彼らは多段階の意思決定と環境フィードバックを必要とする問題に苦戦している。人間のアノテーションを使わずに環境から報酬モデルを自動的に学習できるフレームワークを提案する。
論文参考訳（メタデータ） (2025-02-17T18:49:25Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
大規模言語モデル(LLM)は、ロボティクスの操作やナビゲーションなど、さまざまな領域にまたがる優れた計画能力を示している。特殊なLLMエージェント間で高レベル計画および低レベル制御コード生成を分散する新しいマルチエージェントLLMフレームワークを提案する。長軸タスクを含む9つのRLBenchタスクに対するアプローチを評価し、ゼロショット環境でロボット操作を解く能力を実証した。
論文参考訳（メタデータ） (2024-11-26T17:53:44Z)
EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI [7.040779338576156]
大規模言語モデル(LLM)は、ロボットのためのテキスト計画や制御コードを生成することができる。これらの手法は、異なる環境にまたがる柔軟性と適用性の観点からも、依然として課題に直面している。本稿では,ロボット操作エージェントの適応性と堅牢性を高めるために,EnvBridgeを提案する。
論文参考訳（メタデータ） (2024-10-22T11:52:22Z)
Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model [6.9268843428933025]
大規模言語モデル(LLM)は、意味情報の理解と処理のための強力な計画と推論能力を示している。本稿では,ロボットが与えられたテキストによる指示の下で,自律的に動作や低レベル実行を計画できる新しい言語モデルベースのフレームワークを提案する。
論文参考訳（メタデータ） (2024-08-15T17:33:32Z)
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
大型言語モデル(LLM)は人間のような知性を模倣することができる。 WorkArena++は、Webエージェントの計画、問題解決、論理的/論理的推論、検索、コンテキスト的理解能力を評価するように設計されている。
論文参考訳（メタデータ） (2024-07-07T07:15:49Z)
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning [32.045840007623276]
本稿では,ロボットビジョン・ランゲージ計画(ViLa)について紹介する。 ViLaは、知覚データを推論と計画プロセスに直接統合する。実ロボットとシミュレーション環境の両方で実施した評価は,既存のLCMプランナよりもViLaの方が優れていることを示す。
論文参考訳（メタデータ） (2023-11-29T17:46:25Z)
Efficient Learning of High Level Plans from Play [57.29562823883257]
本稿では,移動計画と深いRLを橋渡しするロボット学習のフレームワークであるELF-Pについて紹介する。 ELF-Pは、複数の現実的な操作タスクよりも、関連するベースラインよりもはるかに優れたサンプル効率を有することを示す。
論文参考訳（メタデータ） (2023-03-16T20:09:47Z)
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models [19.623070762485494]
大型言語モデル(LLM)は、数発のロボット計画において顕著な推論能力を示している。本研究は,LLMがマルチモーダル環境下での対話型ロボットの動作を制御し,高レベルな計画と推論能力を提供することを示す。
論文参考訳（メタデータ） (2023-03-14T23:01:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。