Fugu-MT 論文翻訳(概要): No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos

論文の概要: No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos

arxiv url: http://arxiv.org/abs/2605.22190v1
Date: Thu, 21 May 2026 08:57:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 20:14:18.540938
Title: No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos
Title（参考訳）: 4D動画のフィードフォワードダイナミックなガウシアン
Authors: Matteo Balice, Yanik Kunzi, Chenyangguang Zhang, Matteo Matteucci, Marc Pollefeys, Sungwhan Hong,
Abstract要約: NoPo4Dは、動的コンテンツ、マルチビュー入力、未知のカメラポーズを単一のパスで処理する最初のフィードフォワードシステムである。 4つのマルチビューダイナミックベンチマークでは、NoPo4Dはフィードフォワードベースラインを一貫して上回っている。
参考スコア（独自算出の注目度）: 49.02659692876764
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent feed-forward 3D gaussian splatting methods have made dramatic progress on individual aspects of 3D scene reconstruction, but no existing method jointly addresses dynamic content, multi-view input, and unknown camera poses in a single feed-forward pass. Methods that handle dynamics either require accurate camera poses or accept only monocular input; pose-free multi-view methods address only static scenes; and per-scene optimization methods bridge some of these gaps but at minutes-to-hours cost per scene. We introduce NoPo4D, the first feed-forward system that addresses this empty quadrant. Building on a pretrained geometry backbone and recent 4D Gaussian frameworks, NoPo4D introduces a velocity decomposition that splits Gaussian motion into per-pixel image-plane shifts and depth changes, allowing direct supervision from pseudo ground-truth optical flow on the 2D component. This sidesteps both the differentiable rendering that couples prior posed methods to pose accuracy and the 3D motion ground truth that prior pose-free methods require. The system is rounded out by a bidirectional motion encoder for cross-view and cross-frame feature aggregation, and view-dependent opacity that mitigates cross-view and cross-timestep Gaussian misalignments. On four multi-view dynamic benchmarks, NoPo4D consistently outperforms prior feed-forward baselines, and with an optional post-optimization stage surpasses per-scene optimization methods, while running orders of magnitude faster.
Abstract（参考訳）: 近年のフィードフォワード型3Dガウス撮影法は3次元シーン再構成の個々の側面において劇的な進歩を遂げているが,動的な内容,マルチビュー入力,未知のカメラポーズを単一のフィードフォワードパスで共同で扱う手法は存在しない。ダイナミックスを扱う方法には、正確なカメラのポーズや単眼入力のみを受け入れる方法、ポーズなしのマルチビューメソッドは静的シーンのみに対処する手法、シーンごとの最適化手法は、これらのギャップの一部を埋めるが、シーンごとに数分から数時間でコストがかかる。この空の四角形に対処する最初のフィードフォワードシステムであるNoPo4Dを導入する。事前トレーニングされた幾何学バックボーンと最近の4Dガウスのフレームワークに基づいて、NoPo4Dは速度分解を導入し、ガウス運動を画素ごとの画像プレーンシフトと深さ変化に分割し、2Dコンポーネント上の擬似地表面構造からの直接監督を可能にする。これは、カップルが精度を上げるために提案した方法と、事前のポーズフリーな手法が必要とする3Dモーショングラウンドの真理の両方を左右する。システムは、クロスビューとクロスフレームの特徴集約のための双方向モーションエンコーダと、クロスビューとクロスタイムステップのガウスのミスアライメントを緩和するビュー依存不透明性によって丸められる。 4つのマルチビューダイナミックベンチマークでは、NoPo4Dはフィードフォワードのベースラインを一貫して上回り、オプションのポスト最適化段階はシーンごとの最適化手法を超越し、桁数を桁違いに高速に実行している。

論文の概要: No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos

関連論文リスト