Fugu-MT 論文翻訳(概要): EgoInfinity: A Web-Scale 4D Hand-Object Interaction Data Engine for Any-View Robot Retargeting and Video-to-Action Robot Learning

論文の概要: EgoInfinity: A Web-Scale 4D Hand-Object Interaction Data Engine for Any-View Robot Retargeting and Video-to-Action Robot Learning

arxiv url: http://arxiv.org/abs/2606.17385v2
Date: Fri, 19 Jun 2026 08:44:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-23 13:41:30.9201
Title: EgoInfinity: A Web-Scale 4D Hand-Object Interaction Data Engine for Any-View Robot Retargeting and Video-to-Action Robot Learning
Title（参考訳）: EgoInfinity:Webスケールの4Dハンドオブジェクトインタラクションデータエンジン
Authors: Gaotian Wang, Kejia Ren, Andrew Morgan, Yiting Chen, Howard H. Qian, Podshara Chanrungmaneekul, Kaiyu Hang,
Abstract要約: EgoInfinityは、ロボットの獲得と学習のためのWebスケールデータ生成を可能にする、汎用的な4Dハンドオブジェクトインタラクションデータエンジンである。 EgoInfinity(エゴインフィニティ)は、知覚、セグメンテーション、再構築、相互認識の洗練、そして従来の計算不可能なビデオ対アクション問題を自動化するための、モジュール式エンジンである。
参考スコア（独自算出の注目度）: 10.780924973366737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Internet videos constitute the largest reservoir of embodied human manipulation knowledge, yet converting arbitrary RGB footage into actionable robot training data remains a major bottleneck. Existing lab- or factory-collected datasets are narrow in scale and diversity, limiting open-world robot learning. Instead of proposing a static dataset, we introduce EgoInfinity, a universal 4D hand-object interaction data engine that enables web-scale data generation for robot retargeting and learning. EgoInfinity is a modular engine integrating perception, segmentation, reconstruction, interaction-aware refinement, and retargeting to automate this traditionally unscalable video-to-action problem without human-in-the-loop annotation. Its modular design lets the engine continuously benefit from advances in any incorporated component. With EgoInfinity, in-the-wild human manipulation videos are lifted into agent-agnostic, metric 4D hand-object representations, including hand trajectories, 6-DoF object poses, and contact-relevant states. Rather than naively connecting standalone components, EgoInfinity combines cross-module metric calibration with interaction-aware refinement to improve physical reliability, reducing drift and contact inconsistencies common in pure visual reconstruction. We further propose a novel motion retargeter that compiles the recovered 3D hand motions into executable joint trajectories for diverse robot morphologies, enabling video-to-action retargeting on any robot from arbitrary viewpoints and shot sizes (e.g., the human body is only partially visible). We validate EgoInfinity across perception fidelity, kinematic feasibility, contact consistency, cross-embodiment generalization, and real-robot skill acquisition (e.g., grasping, cutting, wiping, and pouring), demonstrating a scalable bridge from internet videos to executable robot behavior for open-world robot learning.
Abstract（参考訳）: インターネットビデオは人間の操作に関する知識を具現化した最大の貯水池でありながら、任意のRGB映像を実行可能なロボット訓練データに変換することは、依然として大きなボトルネックとなっている。既存のラボや工場で収集されたデータセットは、規模と多様性が狭く、オープンワールドのロボット学習を制限する。静的データセットを提案する代わりに,ロボットの再ターゲットと学習のためのWebスケールデータ生成を可能にする,汎用的な4DハンドオブジェクトインタラクションデータエンジンであるEgoInfinityを導入する。 EgoInfinity(エゴインフィニティ)は、知覚、セグメンテーション、再構築、相互認識の洗練、リターゲティングを統合したモジュラーエンジンである。モジュラー設計により、エンジンは組み込まれたあらゆる部品の進歩から継続的に恩恵を受けることができる。 EgoInfinityでは、手の動き、6-DoFオブジェクトのポーズ、6-DoFオブジェクトのポーズ、接触関連状態を含む、エージェント非依存のメートル法4Dハンドオブジェクト表現に、野生の人間の操作ビデオが持ち上げられる。スタンドアロンコンポーネントをナビゲート的に接続する代わりに、EgoInfinityは、クロスモジュールメトリックキャリブレーションとインタラクション対応のリファインメントを組み合わせることで、物理的な信頼性を改善し、純粋な視覚的再構成で一般的なドリフトとコンタクトの不整合を低減している。さらに,回復した3次元手の動きを様々なロボット形態の実行可能な関節軌道にコンパイルし,任意の視点やショットサイズから任意のロボットに映像からアクションへのリターゲティングを可能にする動き再ターゲッターを提案する(例えば,人体は部分的にしか見えていない)。我々は,インターネットビデオからオープンワールドロボット学習のための実行可能なロボット動作へのスケーラブルなブリッジを実証し,知覚の忠実さ,キネマティックな実現性,接触の整合性,クロス・エボディメントの一般化,実ロボットのスキル獲得(例えば,把握,切断,拭き上げ,注ぐなど)にまたがるエゴインフィニティを検証する。

論文の概要: EgoInfinity: A Web-Scale 4D Hand-Object Interaction Data Engine for Any-View Robot Retargeting and Video-to-Action Robot Learning

関連論文リスト