Fugu-MT 論文翻訳(概要): OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction

論文の概要: OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction

arxiv url: http://arxiv.org/abs/2604.10647v1
Date: Sun, 12 Apr 2026 13:48:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.148508
Title: OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction
Title（参考訳）: OmniUMI:人間と協調したマルチモーダルインタラクションによる物理基盤型ロボット学習を目指して
Authors: Shaqi Luo, Yuanyuan Li, Youhao Hu, Chenhao Yu, Chaoran Xu, Jiachen Zhang, Guocai Yao, Tiejun Huang, Ran He, Zhongyuan Wang,
Abstract要約: UMIスタイルのインタフェースはスケーラブルなロボット学習を可能にするが、既存のシステムはほとんどビジュモータのままである。 OmniUMIは,人間と協調したマルチモーダルインタラクションによる物理的基盤を持つロボット学習のための統合フレームワークである。
参考スコア（独自算出の注目度）: 41.5123936517904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: UMI-style interfaces enable scalable robot learning, but existing systems remain largely visuomotor, relying primarily on RGB observations and trajectory while providing only limited access to physical interaction signals. This becomes a fundamental limitation in contact-rich manipulation, where success depends on contact dynamics such as tactile interaction, internal grasping force, and external interaction wrench that are difficult to infer from vision alone. We present OmniUMI, a unified framework for physically grounded robot learning via human-aligned multimodal interaction. OmniUMI synchronously captures RGB, depth, trajectory, tactile sensing, internal grasping force, and external interaction wrench within a compact handheld system, while maintaining collection--deployment consistency through a shared embodiment design. To support human-aligned demonstration, OmniUMI provides dual-force feedback through bilateral gripper feedback and natural perception of external interaction wrench in the handheld embodiment. Built on this interface, we extend diffusion policy with visual, tactile, and force-related observations, and deploy the learned policy through impedance-based execution for unified regulation of motion and contact behavior. Experiments demonstrate reliable sensing and strong downstream performance on force-sensitive pick-and-place, interactive surface erasing, and tactile-informed selective release. Overall, OmniUMI combines physically grounded multimodal data acquisition with human-aligned interaction, providing a scalable foundation for learning contact-rich manipulation.
Abstract（参考訳）: UMIスタイルのインタフェースはスケーラブルなロボット学習を可能にするが、既存のシステムは主にRGBの観測と軌道に依存し、物理的相互作用信号への限られたアクセスしか提供していない。これは、触覚相互作用、内部把握力、視覚のみから推測することが難しい外部相互作用レンチなどの接触ダイナミクスに成功が依存する、コンタクトリッチな操作の基本的な制限となる。 OmniUMIは,人間と協調したマルチモーダルインタラクションによる物理的基盤を持つロボット学習のための統合フレームワークである。 OmniUMIは、コンパクトハンドヘルドシステム内で、RGB、深さ、軌跡、触覚、内部把握力、外部インタラクションレンチを同期的にキャプチャし、共有エボディメント設計によるコレクション-デプロイ一貫性を維持する。 OmniUMIは、人間による協調的なデモンストレーションを支援するために、両手のグリップフィードバックとハンドヘルドエンボディメントにおける外部相互作用レンチの自然な知覚を通じて、二重力フィードバックを提供する。このインタフェース上に構築された拡散ポリシを視覚的,触覚的,力的な観察で拡張し,インピーダンスに基づく実行を通じて学習ポリシーを展開し,動作と接触行動の統一的な制御を行う。実験は、力に敏感なピック・アンド・プレイス、インタラクティブな表面消去、触覚インフォームド選択的放出に対する信頼性の高いセンシングと強力な下流性能を示す。全体として、OmniUMIは物理的に基盤付けられたマルチモーダルデータ取得とヒューマンアラインなインタラクションを組み合わせることで、コンタクトリッチな操作を学ぶためのスケーラブルな基盤を提供する。

論文の概要: OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction

関連論文リスト