Fugu-MT 論文翻訳(概要): SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

論文の概要: SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

arxiv url: http://arxiv.org/abs/2510.26796v1
Date: Thu, 30 Oct 2025 17:59:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.975461
Title: SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting
Title（参考訳）: SEE4D: 自動回帰ビデオインペインティングによるPse-Free 4D生成
Authors: Dongyue Lu, Ao Liang, Tianxin Huang, Xiao Fu, Yuyang Zhao, Baorui Ma, Liang Pan, Wei Yin, Lingdong Kong, Wei Tsang Ooi, Ziwei Liu,
Abstract要約: SEE4Dは, カジュアルビデオから4次元世界モデリングを行うための, ポーズのないトラジェクトリ・ツー・カメラ・フレームワークである。モデル内のビュー条件ビデオは、現実的に合成された画像を認知する前に、ロバストな幾何学を学ぶために訓練される。クロスビュービデオ生成とスパース再構成のベンチマークでSee4Dを検証した。
参考スコア（独自算出の注目度）: 83.5106058182799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Immersive applications call for synthesizing spatiotemporal 4D content from casual videos without costly 3D supervision. Existing video-to-4D methods typically rely on manually annotated camera poses, which are labor-intensive and brittle for in-the-wild footage. Recent warp-then-inpaint approaches mitigate the need for pose labels by warping input frames along a novel camera trajectory and using an inpainting model to fill missing regions, thereby depicting the 4D scene from diverse viewpoints. However, this trajectory-to-trajectory formulation often entangles camera motion with scene dynamics and complicates both modeling and inference. We introduce SEE4D, a pose-free, trajectory-to-camera framework that replaces explicit trajectory prediction with rendering to a bank of fixed virtual cameras, thereby separating camera control from scene modeling. A view-conditional video inpainting model is trained to learn a robust geometry prior by denoising realistically synthesized warped images and to inpaint occluded or missing regions across virtual viewpoints, eliminating the need for explicit 3D annotations. Building on this inpainting core, we design a spatiotemporal autoregressive inference pipeline that traverses virtual-camera splines and extends videos with overlapping windows, enabling coherent generation at bounded per-step complexity. We validate See4D on cross-view video generation and sparse reconstruction benchmarks. Across quantitative metrics and qualitative assessments, our method achieves superior generalization and improved performance relative to pose- or trajectory-conditioned baselines, advancing practical 4D world modeling from casual videos.
Abstract（参考訳）: 没入型アプリケーションは、高価な3D監督なしでカジュアルビデオから時空間4Dコンテンツを合成することを要求する。既存のビデオから4Dの方法は、手動でアノテートされたカメラのポーズに頼っている。近年のワープ・テン・インペント・アプローチは、新しいカメラ軌道に沿って入力フレームをワープし、塗装モデルを用いて不足領域を埋めることにより、様々な視点から4Dシーンを描写することで、ポーズラベルの必要性を緩和している。しかし、この軌跡と軌跡の定式化はしばしばシーンダイナミクスとカメラの動きを絡み合わせ、モデリングと推論の両方を複雑にする。 SEE4Dはポーズレスのトラジェクトリ・トゥ・カメラ・フレームワークで、固定された仮想カメラのバンクに描画によって明示的なトラジェクトリ予測を置き換え、シーンモデリングからカメラ制御を分離する。リアルに合成された歪んだイメージをデノイングし、仮想的な視点で隠蔽された領域や欠落した領域を塗布し、明示的な3Dアノテーションを不要にすることで、より堅牢な幾何学を学ぶために、ビューコンディショナルなビデオインペイントモデルが訓練される。この塗装コアをベースとして,仮想カメラのスプラインを横切る時空間自己回帰推論パイプラインを設計し,重なり合うウィンドウでビデオを拡張し,ステップ単位の制約付きでコヒーレントな生成を可能にする。クロスビュービデオ生成とスパース再構成のベンチマークでSee4Dを検証した。定量的な測度と質的評価を総合して,提案手法はポーズ条件や軌道条件によるベースラインに対して,より優れた一般化と性能向上を実現し,カジュアルビデオからの実用的な4次元世界モデリングを推し進める。

論文の概要: SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

関連論文リスト