Fugu-MT 論文翻訳(概要): UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images

論文の概要: UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images

arxiv url: http://arxiv.org/abs/2602.24290v2
Date: Thu, 05 Mar 2026 05:12:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-06 15:25:24.066424
Title: UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images
Title（参考訳）: UFO-4D:2枚の画像からフィードフォワード4Dを復元
Authors: Junhwa Hur, Charles Herrmann, Songyou Peng, Philipp Henzler, Zeyu Ma, Todd Zickler, Deqing Sun,
Abstract要約: UFO-4Dは、一対の未提示画像から高密度で明示的な4D表現を再構成するための統合フィードフォワードフレームワークである。 UFO-4Dはダイナミックな3Dガウシアンを直接推定し、3D幾何学、3Dモーション、カメラのポーズのジョイントで一貫した推定を可能にする。我々の表現はまた、新しいビューや時間にわたって高忠実な4D合成を可能にする。
参考スコア（独自算出の注目度）: 43.53497980792498
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dense 4D reconstruction from unposed images remains a critical challenge, with current methods relying on slow test-time optimization or fragmented, task-specific feedforward models. We introduce UFO-4D, a unified feedforward framework to reconstruct a dense, explicit 4D representation from just a pair of unposed images. UFO-4D directly estimates dynamic 3D Gaussian Splats, enabling the joint and consistent estimation of 3D geometry, 3D motion, and camera pose in a feedforward manner. Our core insight is that differentiably rendering multiple signals from a single Dynamic 3D Gaussian representation offers major training advantages. This approach enables a self-supervised image synthesis loss while tightly coupling appearance, depth, and motion. Since all modalities share the same geometric primitives, supervising one inherently regularizes and improves the others. This synergy overcomes data scarcity, allowing UFO-4D to outperform prior work by up to 3 times in joint geometry, motion, and camera pose estimation. Our representation also enables high-fidelity 4D interpolation across novel views and time. Please visit our project page for visual results: https://ufo-4d.github.io/
Abstract（参考訳）: 未提示画像からのDense 4D再構成は依然として重要な課題であり、現在の手法は遅いテスト時間最適化やタスク固有のフィードフォワードモデルに依存している。 UFO-4Dは、一対の未提示画像から高密度で明示的な4D表現を再構成するための統合フィードフォワードフレームワークである。 UFO-4Dはダイナミックな3Dガウススプレートを直接推定し、3D幾何学、3Dモーション、カメラのポーズをフィードフォワード方式で共同で一貫した推定を可能にする。私たちの中核的な洞察は、単一のDynamic 3D Gaussian表現から複数の信号を微分的にレンダリングすることは、大きなトレーニング上の利点をもたらすということです。このアプローチは、外観、深さ、動きを密結合しながら、自己教師付き画像合成損失を可能にする。すべてのモジュラリティは同じ幾何学的プリミティブを共有しているため、1つを監督することは本質的に規則化し、他のプリミティブを改善する。このシナジーはデータの不足を克服し、UFO-4Dは関節形状、動き、カメラのポーズ推定において最大3倍の速さで先行作業に勝る。我々の表現はまた、新しいビューや時間にまたがる高忠実な4D補間を可能にする。ビジュアルな結果については、プロジェクトページを参照してください。

論文の概要: UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images

関連論文リスト