Fugu-MT 論文翻訳(概要): Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos

論文の概要: Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos

arxiv url: http://arxiv.org/abs/2509.24209v1
Date: Mon, 29 Sep 2025 02:47:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.70063
Title: Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos
Title（参考訳）: Forge4D: 疎視映像からのフィードフォワード4Dの復元と補間
Authors: Yingdong Hu, Yisheng He, Jinnan Chen, Weihao Yuan, Kejie Qiu, Zehong Lin, Siyu Zhu, Zilong Dong, Jun Zhang,
Abstract要約: 本研究では,非校正されたスパースビュー映像からの時間的整列表現を効率よく校正するフィードフォワード4次元人間の再構成とモデルを提案する。新たに,隣接する2つのフレーム間の3次元ガウス運動の高密度な動きを予測するための動き予測モジュールを設計した。実験では、ドメイン内データセットとドメイン外データセットの両方において、モデルの有効性を示す。
参考スコア（独自算出の注目度）: 27.595035122927204
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instant reconstruction of dynamic 3D humans from uncalibrated sparse-view videos is critical for numerous downstream applications. Existing methods, however, are either limited by the slow reconstruction speeds or incapable of generating novel-time representations. To address these challenges, we propose Forge4D, a feed-forward 4D human reconstruction and interpolation model that efficiently reconstructs temporally aligned representations from uncalibrated sparse-view videos, enabling both novel view and novel time synthesis. Our model simplifies the 4D reconstruction and interpolation problem as a joint task of streaming 3D Gaussian reconstruction and dense motion prediction. For the task of streaming 3D Gaussian reconstruction, we first reconstruct static 3D Gaussians from uncalibrated sparse-view images and then introduce learnable state tokens to enforce temporal consistency in a memory-friendly manner by interactively updating shared information across different timestamps. For novel time synthesis, we design a novel motion prediction module to predict dense motions for each 3D Gaussian between two adjacent frames, coupled with an occlusion-aware Gaussian fusion process to interpolate 3D Gaussians at arbitrary timestamps. To overcome the lack of the ground truth for dense motion supervision, we formulate dense motion prediction as a dense point matching task and introduce a self-supervised retargeting loss to optimize this module. An additional occlusion-aware optical flow loss is introduced to ensure motion consistency with plausible human movement, providing stronger regularization. Extensive experiments demonstrate the effectiveness of our model on both in-domain and out-of-domain datasets. Project page and code at: https://zhenliuzju.github.io/huyingdong/Forge4D.
Abstract（参考訳）: 非校正されたスパースビュービデオからの動的3次元人間の即時再構築は多くの下流アプリケーションにとって重要である。しかし、既存の手法は、遅い復元速度によって制限されるか、新しい時間表現を生成できないかのいずれかである。これらの課題に対処するため,フィードフォワード4次元人間の再構成と補間モデルであるForge4Dを提案する。本モデルでは, 3次元ガウス再構成と高密度動き予測の連成課題として, 4次元再構成と補間問題を単純化する。 3Dガウス変換の処理では,まず静止3Dガウスアンをスパースビュー画像から再構成し,異なるタイムスタンプをまたいだ共有情報をインタラクティブに更新することで,時間的整合性を実現するための学習可能な状態トークンを導入する。新たな時間合成のために,隣接する2つのフレーム間の各3次元ガウスの密度運動を予測し,任意のタイムスタンプで3次元ガウスを補間するためのオクルージョン対応ガウス融合プロセスと組み合わせた,新しい動き予測モジュールを設計する。濃密な運動監視のための基底的真実の欠如を克服するため、高密度な運動予測を高密度な点マッチングタスクとして定式化し、このモジュールを最適化するために自己監督的再ターゲット損失を導入する。より強力な正則化を実現するため、可視的な人間の動きとの運動の整合性を確保するために、追加のオクルージョン対応光フロー損失が導入された。広範囲にわたる実験により、ドメイン内およびドメイン外両方のデータセットに対するモデルの有効性が実証された。プロジェクトページとコード https://zhenliuzju.github.io/huyingdong/Forge4D

論文の概要: Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos

関連論文リスト