Fugu-MT 論文翻訳(概要): EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

論文の概要: EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

arxiv url: http://arxiv.org/abs/2602.23205v1
Date: Thu, 26 Feb 2026 16:53:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-27 18:41:22.793062
Title: EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents
Title（参考訳）: EmbodMocap: 人工呼吸器の4Dヒューマンシーン再構築
Authors: Wenjia Wang, Liang Pan, Huaijin Pi, Yuke Lou, Xuqian Ren, Yifan Wu, Zhouyingcheng Liao, Lei Yang, Rishabh Dabral, Christian Theobalt, Taku Komura,
Abstract要約: EmbodMocapは2つの動くiPhoneを使ったポータブルで安価なデータ収集パイプラインである。私たちのキーとなるアイデアは、二重RGB-Dシーケンスを共同で校正し、人間とシーンの両方を再構築することです。収集したデータに基づいて、我々は3つの具体的AIタスクを強化した: モノクラーヒューマン・シーン・リコンストラクション(モノクラーヒューマン・シーン・リコンストラクション)、メトリックスケールで世界空間に整合した人間とシーンを出力するフィードフォワードモデル、物理ベースのキャラクターアニメーション。
参考スコア（独自算出の注目度）: 85.77432303199176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human behaviors in the real world naturally encode rich, long-term contextual information that can be leveraged to train embodied agents for perception, understanding, and acting. However, existing capture systems typically rely on costly studio setups and wearable devices, limiting the large-scale collection of scene-conditioned human motion data in the wild. To address this, we propose EmbodMocap, a portable and affordable data collection pipeline using two moving iPhones. Our key idea is to jointly calibrate dual RGB-D sequences to reconstruct both humans and scenes within a unified metric world coordinate frame. The proposed method allows metric-scale and scene-consistent capture in everyday environments without static cameras or markers, bridging human motion and scene geometry seamlessly. Compared with optical capture ground truth, we demonstrate that the dual-view setting exhibits a remarkable ability to mitigate depth ambiguity, achieving superior alignment and reconstruction performance over single iphone or monocular models. Based on the collected data, we empower three embodied AI tasks: monocular human-scene-reconstruction, where we fine-tune on feedforward models that output metric-scale, world-space aligned humans and scenes; physics-based character animation, where we prove our data could be used to scale human-object interaction skills and scene-aware motion tracking; and robot motion control, where we train a humanoid robot via sim-to-real RL to replicate human motions depicted in videos. Experimental results validate the effectiveness of our pipeline and its contributions towards advancing embodied AI research.
Abstract（参考訳）: 現実世界の人間の行動は、知覚、理解、行動のためにエンボディされたエージェントを訓練するために活用できる、リッチで長期的な文脈情報を自然にエンコードする。しかし、既存のキャプチャシステムは通常、コストのかかるスタジオのセットアップとウェアラブルデバイスに依存しており、シーンコンディショニングされた人間のモーションデータの大規模な収集を制限している。そこで本研究では,携帯型かつ安価なデータ収集パイプラインであるEmbodMocapを提案する。我々のキーとなるアイデアは、2つのRGB-Dシーケンスを共同で校正し、統一された計量世界座標フレーム内で人間とシーンを再構築することです。提案手法は,静的なカメラやマーカーを使わずに,人間の動きやシーンの形状をシームレスに再現する。光学的キャプチャグラウンドの真実と比較すると、デュアルビュー設定は奥行きのあいまいさを軽減し、単一のiphoneやモノクラーモデルよりも優れたアライメントと再構成性能を達成できることを示す。モノラルなヒューマン・シーン・リコンストラクション、メトリックスケールで世界空間に整合した人間とシーンを出力するフィードフォワードモデル、物理ベースのキャラクターアニメーション、人間とオブジェクトのインタラクションスキルとシーン認識のモーショントラッキングのスケールアップに使用可能なデータ、ロボットモーションコントロール、シミュレーションからリアルなRLを使ってヒューマノイドロボットを訓練してビデオに映った人間の動きを再現するロボットモーションコントロール。実験結果から,私たちのパイプラインの有効性と,AI研究の進歩への貢献が検証された。

論文の概要: EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

関連論文リスト