Fugu-MT 論文翻訳(概要): TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks

論文の概要: TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks

arxiv url: http://arxiv.org/abs/2601.14550v1
Date: Wed, 21 Jan 2026 00:14:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.872624
Title: TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks
Title（参考訳）: TacUMI:コンタクトリッチタスクのためのマルチモーダルユニバーサルマニピュレーションインタフェース
Authors: Tailai Cheng, Kejia Chen, Lingyun Chen, Liding Zhang, Yue Zhang, Yao Ling, Mahdi Hamad, Zhenshan Bing, Fan Wu, Karan Sharma, Alois Knoll,
Abstract要約: 我々は、ハンドヘルドデモデバイスUniversal Manipulation Interface(UMI)のアイデアに基づいて構築する。本稿では,ViTacセンサ,力トルクセンサ,ポーズトラッカーをロボット互換グリップパーに組み込んだマルチモーダルデータ収集システムであるTacUMIを紹介する。次に、時間モデルを利用して意味的に意味のあるイベント境界を検出するマルチモーダルセグメンテーションフレームワークを提案する。
参考スコア（独自算出の注目度）: 35.05859151174601
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Task decomposition is critical for understanding and learning complex long-horizon manipulation tasks. Especially for tasks involving rich physical interactions, relying solely on visual observations and robot proprioceptive information often fails to reveal the underlying event transitions. This raises the requirement for efficient collection of high-quality multi-modal data as well as robust segmentation method to decompose demonstrations into meaningful modules. Building on the idea of the handheld demonstration device Universal Manipulation Interface (UMI), we introduce TacUMI, a multi-modal data collection system that integrates additionally ViTac sensors, force-torque sensor, and pose tracker into a compact, robot-compatible gripper design, which enables synchronized acquisition of all these modalities during human demonstrations. We then propose a multi-modal segmentation framework that leverages temporal models to detect semantically meaningful event boundaries in sequential manipulations. Evaluation on a challenging cable mounting task shows more than 90 percent segmentation accuracy and highlights a remarkable improvement with more modalities, which validates that TacUMI establishes a practical foundation for both scalable collection and segmentation of multi-modal demonstrations in contact-rich tasks.
Abstract（参考訳）: タスク分解は複雑な長距離操作タスクの理解と学習に不可欠である。特に、リッチな物理的相互作用を含むタスクでは、視覚的な観察とロボットの受容的情報のみに依存するが、基礎となる事象遷移を明らかにするのに失敗することが多い。これにより、高品質なマルチモーダルデータの効率的な収集と、デモを意味のあるモジュールに分解する堅牢なセグメンテーションが要求される。ハンドヘルドデモデバイスUniversal Manipulation Interface (UMI) のアイデアに基づいて,VTACセンサ,力トルクセンサ,ポーズトラッカをコンパクトなロボット互換グリップパー設計に統合したマルチモーダルデータ収集システムであるTacUMIを導入する。次に、時間モデルを利用して、シーケンシャルな操作において意味的に意味のあるイベント境界を検出するマルチモーダルセグメンテーションフレームワークを提案する。ケーブル実装タスクの課題評価では,90%以上のセグメンテーション精度が示され,さらにモダリティが向上した。

論文の概要: TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks

関連論文リスト