Fugu-MT 論文翻訳(概要): UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph

論文の概要: UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph

arxiv url: http://arxiv.org/abs/2602.13086v1
Date: Fri, 13 Feb 2026 16:47:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.440091
Title: UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph
Title（参考訳）: UniManip: エージェント操作グラフを用いた汎用ゼロショットロボットマニピュレーション
Authors: Haichao Liu, Yuanjiang Xue, Yuheng Zhou, Haoyuan Deng, Yinan Liang, Lihua Xie, Ziwei Wang,
Abstract要約: We present UniManip, a framework based on a Bi-level Agentic Operational Graph (AOG) タスクオーケストレーションのための高レベルのエージェント層と、動的状態表現のための低レベルのScene Layerを結合することにより、システムは、抽象的な計画と幾何学的制約を継続的に整合させる。実験では、未確認のオブジェクトやタスクに対するシステムの堅牢なゼロショット能力を評価し、最先端のVLAや階層的なベースラインと比較して22.5%と25.0%の成功率を示した。
参考スコア（独自算出の注目度）: 23.060488218180936
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Achieving general-purpose robotic manipulation requires robots to seamlessly bridge high-level semantic intent with low-level physical interaction in unstructured environments. However, existing approaches falter in zero-shot generalization: end-to-end Vision-Language-Action (VLA) models often lack the precision required for long-horizon tasks, while traditional hierarchical planners suffer from semantic rigidity when facing open-world variations. To address this, we present UniManip, a framework grounded in a Bi-level Agentic Operational Graph (AOG) that unifies semantic reasoning and physical grounding. By coupling a high-level Agentic Layer for task orchestration with a low-level Scene Layer for dynamic state representation, the system continuously aligns abstract planning with geometric constraints, enabling robust zero-shot execution. Unlike static pipelines, UniManip operates as a dynamic agentic loop: it actively instantiates object-centric scene graphs from unstructured perception, parameterizes these representations into collision-free trajectories via a safety-aware local planner, and exploits structured memory to autonomously diagnose and recover from execution failures. Extensive experiments validate the system's robust zero-shot capability on unseen objects and tasks, demonstrating a 22.5% and 25.0% higher success rate compared to state-of-the-art VLA and hierarchical baselines, respectively. Notably, the system enables direct zero-shot transfer from fixed-base setups to mobile manipulation without fine-tuning or reconfiguration. Our open-source project page can be found at https://henryhcliu.github.io/unimanip.
Abstract（参考訳）: 汎用的なロボット操作を実現するには、非構造環境における低レベルの物理的相互作用によって、高レベルの意味的意図をシームレスにブリッジする必要がある。エンド・ツー・エンド・ビジョン・ランゲージ・アクション(VLA)モデルは、長い水平タスクに必要な精度を欠くことが多いが、伝統的な階層的プランナーは、オープンワールドの変動に直面しているときに意味的な剛性に悩まされる。そこで本研究では,2段階のエージェント操作グラフ(AOG)を基盤としたUniManipについて述べる。タスクオーケストレーションのための高レベルのエージェントレイヤと、動的状態表現のための低レベルのScene Layerを結合することにより、システムは、抽象的な計画と幾何学的制約を継続的に整合させ、堅牢なゼロショット実行を可能にします。静的パイプラインとは異なり、UniManipは動的エージェントループとして機能し、非構造化の知覚からオブジェクト中心のシーングラフを積極的にインスタンス化し、これらの表現を安全を意識したローカルプランナを介して衝突のないトラジェクトリにパラメータ化し、構造化メモリを利用して自動診断し、実行障害から回復する。大規模な実験では、未確認のオブジェクトやタスクに対して、システムの堅牢なゼロショット能力を検証し、最先端のVLAと階層的なベースラインと比較して、22.5%と25.0%の成功率を示した。特に、固定ベースの設定からモバイル操作への直接ゼロショット転送を、微調整や再構成をすることなく実現している。私たちのオープンソースプロジェクトのページはhttps://henryhcliu.github.io/unimanip.com/https://henryhcliu.github.io/unimanip.comで閲覧できます。

論文の概要: UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph

関連論文リスト