Fugu-MT 論文翻訳(概要): UniCon3R: Contact-aware 3D Human-Scene Reconstruction from Monocular Video

論文の概要: UniCon3R: Contact-aware 3D Human-Scene Reconstruction from Monocular Video

arxiv url: http://arxiv.org/abs/2604.19923v1
Date: Tue, 21 Apr 2026 19:06:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.754415
Title: UniCon3R: Contact-aware 3D Human-Scene Reconstruction from Monocular Video
Title（参考訳）: UniCon3R:モノクロ映像から3Dの人間シーンを再現
Authors: Tanuj Sur, Shashank Tripathi, Nikos Athanasiou, Ha Linh Nguyen, Kai Xu, Michael J. Black, Angela Yao,
Abstract要約: モノクロビデオからのオンライン人間シーン4D再構成のための統合フィードフォワードフレームワークUniCon3Rを紹介する。人間のポーズやシーン形状から3次元接触を推定することにより,インタラクションをモデル化する。これにより、UniCon3Rは高忠実なシーン形状と空間的に整列した3D人間を共同で再現することができる。
参考スコア（独自算出の注目度）: 82.5562736830041
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce UniCon3R (Unified Contact-aware 3D Reconstruction), a unified feed-forward framework for online human-scene 4D reconstruction from monocular videos. Recent feed-forward methods enable real-time world-coordinate human motion and scene reconstruction, but they often produce physically implausible artifacts such as bodies floating above the ground or penetrating parts of the scene. The key reason is that existing approaches fail to model physical interactions between the human and the environment. A natural next step is to predict human-scene contact as an auxiliary output -- yet we find this alone is not sufficient: contact must actively correct the reconstruction. To address this, we explicitly model interaction by inferring 3D contact from the human pose and scene geometry and use the contact as a corrective cue for generating the final pose. This enables UniCon3R to jointly recover high-fidelity scene geometry and spatially aligned 3D humans within the scene. Experiments on standard human-centric video benchmarks such as RICH, EMDB, 3DPW and SLOPER4D show that UniCon3R outperforms state-of-the-art baselines on physical plausibility and global human motion estimation while achieving real-time online inference. We experimentally demonstrate that contact serves as a powerful internal prior rather than just an external metric, thus establishing a new paradigm for physically grounded joint human-scene reconstruction. Project page is available at https://surtantheta.github.io/UniCon3R .
Abstract（参考訳）: UniCon3R(Unified Contact-aware 3D Restruction)は、モノクロビデオからのオンライン人間シーン4D再構成のための統合フィードフォワードフレームワークである。近年のフィードフォワード方式では、人間のリアルタイムな動きやシーンの再構築が可能になっているが、地上に浮かぶ身体や、シーンの一部に浸透する身体など、物理的に不明瞭な人工物がしばしば生産されている。主な理由は、既存のアプローチが人間と環境の間の物理的相互作用をモデル化できないからである。自然な次のステップは、人間のシーンの接触を補助的な出力として予測することです。そこで我々は,人間のポーズとシーン形状から3次元接触を推定し,その接触を最終的なポーズを生成するための補正キューとして用いることにより,インタラクションを明示的にモデル化する。これにより、UniCon3Rは高忠実なシーン形状と空間的に整列した3D人間を共同で再現することができる。 RICH、EMDB、3DPW、SLOPER4Dなどの標準的な人中心ビデオベンチマークの実験では、UniCon3Rはリアルタイムオンライン推論を達成しつつ、物理的妥当性とグローバルな人間の動き推定に関する最先端のベースラインを上回っている。我々は,接触が単なる外的メートル法ではなく,強力な内的先行的役割を果たしていることを実験的に証明し,物理的に接地されたヒト・シーンの再構成のための新しいパラダイムを確立する。プロジェクトページはhttps://surtantheta.github.io/UniCon3R で公開されている。

論文の概要: UniCon3R: Contact-aware 3D Human-Scene Reconstruction from Monocular Video

関連論文リスト