Fugu-MT 論文翻訳(概要): Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos

論文の概要: Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos

arxiv url: http://arxiv.org/abs/2605.14462v1
Date: Thu, 14 May 2026 06:56:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.675206
Title: Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos
Title（参考訳）: HOIにおけるReal2Sim:モノクロ映像からの物理的にプラズブルなHOI再構成に向けて
Authors: Yubo Zhao, Yujin Chai, Yunao Dong, Chengfeng Zhao, Zijiao Zeng, Yuan Liu, Chi-Keung Tang,
Abstract要約: HOI再建は、人間と物体の追跡に止まるのではなく、動きをコヒーレントな相互作用にする関係を回復すべきである。我々は,4D HOIアニメーションを室内モノクロビデオから再構成するためのフレームワークである$textbfHA-HOI$を紹介した。我々の研究は、一般的なモノクラーHOIビデオを、ヒューマノイドオブジェクトの動作のためのスケーラブルなデモに変換するための一歩を踏み出した。
参考スコア（独自算出の注目度）: 26.750787853601413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recovering 4D human-object interaction (HOI) from monocular video is a key step toward scalable 3D content creation, embodied AI, and simulation-based learning. Recent methods can reconstruct temporally coherent human and object trajectories, but these trajectories often remain visual artifacts while failing to preserve stable contact, functional manipulation, or physical plausibility when used as reference motions for humanoid-object simulation. This reveals a fundamental interaction gap: HOI reconstruction should not stop at tracking a human and an object, but should recover the relation that makes their motion a coherent interaction. We introduce $\textbf{HA-HOI}$, a framework for reconstructing physically plausible 4D HOI animation from in-the-wild monocular videos. Instead of treating the human and object as independent entities in an ambiguous monocular 3D space, we propose a $\textit{human-first, object-follow}$ formulation. The human motion is recovered as the interaction anchor, and the object is reconstructed, aligned, and refined relative to the human action. The resulting kinematic trajectory is then projected into a physics-based humanoid-object simulation, where it acts as a teacher trajectory for stable physical rollout. Across benchmark and in-the-wild videos, $\textbf{HA-HOI}$ improves human-object alignment, contact consistency, temporal stability, and simulation readiness over prior monocular HOI reconstruction methods. By moving beyond visually plausible trajectory recovery toward physically grounded interaction animation, our work takes a step toward turning general monocular HOI videos into scalable demonstrations for humanoid-object behavior. Project page: https://knoxzhao.github.io/real2sim_in_HOI/
Abstract（参考訳）: モノクロビデオから4Dヒューマンオブジェクトインタラクション(HOI)を回収することは、スケーラブルな3Dコンテンツ作成、具体化AI、シミュレーションベースの学習への重要なステップである。近年の手法では、時間的コヒーレントな人間と物体の軌跡を再構築することができるが、ヒューマノイドオブジェクトシミュレーションの基準運動として使用する場合、これらの軌跡は、安定な接触、機能的操作、物理的可視性を保たず、視覚的アーティファクトのままであることが多い。 HOI再建は人間と物体を追跡するのをやめるのではなく、彼らの動きをコヒーレントな相互作用にする関係を回復するべきです。 In-the-wild monocular video から,物理的に検証可能な 4D HOI アニメーションを再構成するためのフレームワークである $\textbf{HA-HOI}$ を紹介する。あいまいな単分子3次元空間において、人間と物体を独立した実体として扱う代わりに、$\textit{human-first, object-follow}$ の定式化を提案する。人間の動きは相互作用アンカーとして回収され、その物体は人間の行動に対して再構成され、整列され、洗練される。結果として生じる運動軌道は、物理学に基づくヒューマノイドオブジェクトシミュレーションに投影され、安定した物理ロールアウトのための教師軌道として機能する。 Across benchmark and in-the-wild video, $\textbf{HA-HOI}$ improves human-object alignment, contact consistency, temporal stability and Simulation readiness than prior monocular HOI reconstruction method。我々の研究は、視覚的にもっともらしい軌道回復から物理的に接地された相互作用のアニメーションへと進むことで、一般的なモノクロHOIビデオをヒューマノイドオブジェクトの動作のためのスケーラブルなデモに変換するための一歩を踏み出した。プロジェクトページ:https://knoxzhao.github.io/real2sim_in_HOI/

論文の概要: Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos

関連論文リスト