Fugu-MT 論文翻訳(概要): EgoSim: Egocentric World Simulator for Embodied Interaction Generation

論文の概要: EgoSim: Egocentric World Simulator for Embodied Interaction Generation

arxiv url: http://arxiv.org/abs/2604.01001v1
Date: Wed, 01 Apr 2026 15:00:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:32.049617
Title: EgoSim: Egocentric World Simulator for Embodied Interaction Generation
Title（参考訳）: EgoSim:Egocentric World Simulator for Embodied Interaction Generation
Authors: Jinkun Hao, Mingda Jia, Ruiyan Wang, Xihui Liu, Ran Yi, Lizhuang Ma, Jiangmiao Pang, Xudong Xu,
Abstract要約: EgoSimは、空間的に一貫した対話ビデオを生成するクローズドループエゴセントリックな世界シミュレータである。連続シミュレーションのために、基礎となる3Dシーン状態を継続的に更新する。 EgoSimは、視覚的品質、空間的整合性、一般化の点で、既存の手法を大幅に上回っている。
参考スコア（独自算出の注目度）: 93.11209644808783
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce EgoSim, a closed-loop egocentric world simulator that generates spatially consistent interaction videos and persistently updates the underlying 3D scene state for continuous simulation. Existing egocentric simulators either lack explicit 3D grounding, causing structural drift under viewpoint changes, or treat the scene as static, failing to update world states across multi-stage interactions. EgoSim addresses both limitations by modeling 3D scenes as updatable world states. We generate embodiment interactions via a Geometry-action-aware Observation Simulation model, with spatial consistency from an Interaction-aware State Updating module. To overcome the critical data bottleneck posed by the difficulty in acquiring densely aligned scene-interaction training pairs, we design a scalable pipeline that extracts static point clouds, camera trajectories, and embodiment actions from in-the-wild large-scale monocular egocentric videos. We further introduce EgoCap, a capture system that enables low-cost real-world data collection with uncalibrated smartphones. Extensive experiments demonstrate that EgoSim significantly outperforms existing methods in terms of visual quality, spatial consistency, and generalization to complex scenes and in-the-wild dexterous interactions, while supporting cross-embodiment transfer to robotic manipulation. Codes and datasets will be open soon. The project page is at egosimulator.github.io.
Abstract（参考訳）: 本研究では,空間的に一貫したインタラクションビデオを生成するクローズドループエゴセントリック世界シミュレータであるEgoSimを紹介し,連続シミュレーションのための基礎となる3Dシーン状態を継続的に更新する。既存のエゴセントリックシミュレータは明示的な3Dグラウンドを欠いているか、視点の変化の下で構造的なドリフトを引き起こしているか、あるいはシーンを静的として扱い、多段階の相互作用で世界状態を更新できないかのいずれかである。 EgoSimは、3Dシーンをアップダブルな世界状態としてモデル化することで、両方の制限に対処する。我々は、干渉対応状態更新モジュールから空間的整合性を持つ幾何対応観測シミュレーションモデルを用いて、エンボディメント相互作用を生成する。密集したシーン・インタラクション・トレーニングペアの獲得が困難であるために,我々は,静的点雲,カメラ軌跡,エンボディメントアクションを広範に抽出するスケーラブルなパイプラインを設計した。さらに,非校正スマートフォンによる低コストな実世界のデータ収集を可能にするキャプチャシステムであるEgoCapについても紹介する。 EgoSimは、視覚的品質、空間的整合性、複雑なシーンへの一般化、そして、ロボット操作へのクロス・エボディメント・トランスファーのサポートなどにおいて、既存の手法を著しく上回っている。コードとデータセットは近く公開される。プロジェクトページは egosimulator.github.io にある。

論文の概要: EgoSim: Egocentric World Simulator for Embodied Interaction Generation

関連論文リスト