Fugu-MT 論文翻訳(概要): X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

論文の概要: X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

arxiv url: http://arxiv.org/abs/2605.12162v1
Date: Tue, 12 May 2026 14:13:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.907531
Title: X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction
Title（参考訳）: X-Imitator:双方向行動-空間相互作用による空間認識模倣学習
Authors: Kai Xiong, Hongjie Fang, Lixin Yang, Cewu Lu,
Abstract要約: X-イミッタ(X-Imitator)は、空間知覚と行動実行を密結合した双方向ループとしてモデル化する多目的デュアルパスフレームワークである。モジュラーアーキテクチャとして設計され、様々なビジュモータポリシーにシームレスに統合できる。
参考スコア（独自算出の注目度）: 47.55207856290542
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effectively handling the interplay between spatial perception and action generation remains a critical bottleneck in robotic manipulation. Existing methods typically treat spatial perception and action execution as decoupled or strictly unidirectional processes, fundamentally restricting a robot's ability to master complex manipulation tasks. To address this, we propose X-Imitator, a versatile dual-path framework that models spatial perception and action execution as a tightly coupled bidirectional loop. By reciprocally conditioning current pose predictions on past actions and vice versa, this framework enables continuous mutual refinement between spatial reasoning and action generation. This joint modeling exactly mimics human internal forward models. Designed as a modular architecture, the system can be seamlessly integrated into various visuomotor policies. Extensive experiments across 24 simulated and 3 real-world tasks demonstrate that our framework significantly outperforms both vanilla policies and prior methods utilizing explicit pose guidance. The code will be open sourced.
Abstract（参考訳）: 空間知覚と行動生成の相互作用を効果的に扱うことは、ロボット操作において重要なボトルネックである。既存の方法は、通常、空間知覚と行動実行を分離されたまたは厳密に一方向のプロセスとして扱い、ロボットが複雑な操作タスクをマスターする能力を根本的に制限する。そこで本稿では,空間認識と行動実行を密結合した双方向ループとしてモデル化する多目的デュアルパスフレームワークであるX-Imitatorを提案する。過去の行動に対する現在のポーズ予測を相互に条件付けすることにより、空間的推論と行動生成の連続的な相互改善を可能にする。このジョイントモデリングは、人間の内部フォワードモデルを正確に模倣する。モジュラーアーキテクチャとして設計され、様々なビジュモータポリシーにシームレスに統合できる。 24のシミュレーションと3つの実世界のタスクにわたる大規模な実験により、我々のフレームワークはバニラポリシーと明示的なポーズガイダンスを用いた事前手法の両方を著しく上回っていることが示された。コードはオープンソース化される。

論文の概要: X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

関連論文リスト