Fugu-MT 論文翻訳(概要): S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix

論文の概要: S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix

arxiv url: http://arxiv.org/abs/2508.08048v1
Date: Mon, 11 Aug 2025 14:50:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:29.154811
Title: S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Title（参考訳）: S^2VG: Denoising Frame Matrixによる立体・空間映像の生成
Authors: Peng Dai, Feitong Tan, Qiangeng Xu, Yihua Huang, David Futschik, Ruofei Du, Sean Fanello, Yinda Zhang, Xiaojuan Qi,
Abstract要約: そこで本研究では,既製の単眼ビデオ生成モデルを利用して,没入型3Dビデオを生成する,ポーズフリーかつトレーニングフリーな手法を提案する。提案手法はまず,生成したモノクロ映像を推定深度情報を用いて予め定義されたカメラ視点にワープし,新しいテキストフレーム・マトリクス・インペイント・フレームワークを適用した。提案手法の有効性は,Sora, Lumiere, WALT, Zeroscope など,様々な生成モデルを用いた実験により検証した。
参考スコア（独自算出の注目度）: 60.060882467801484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While video generation models excel at producing high-quality monocular videos, generating 3D stereoscopic and spatial videos for immersive applications remains an underexplored challenge. We present a pose-free and training-free method that leverages an off-the-shelf monocular video generation model to produce immersive 3D videos. Our approach first warps the generated monocular video into pre-defined camera viewpoints using estimated depth information, then applies a novel \textit{frame matrix} inpainting framework. This framework utilizes the original video generation model to synthesize missing content across different viewpoints and timestamps, ensuring spatial and temporal consistency without requiring additional model fine-tuning. Moreover, we develop a \dualupdate~scheme that further improves the quality of video inpainting by alleviating the negative effects propagated from disoccluded areas in the latent space. The resulting multi-view videos are then adapted into stereoscopic pairs or optimized into 4D Gaussians for spatial video synthesis. We validate the efficacy of our proposed method by conducting experiments on videos from various generative models, such as Sora, Lumiere, WALT, and Zeroscope. The experiments demonstrate that our method has a significant improvement over previous methods. Project page at: https://daipengwa.github.io/S-2VG_ProjectPage/
Abstract（参考訳）: ビデオ生成モデルは高品質なモノクロビデオを作るのに優れているが、没入型アプリケーションのための立体的および空間的ビデオを生成することは、まだ未解決の課題である。そこで本研究では,既製の単眼ビデオ生成モデルを利用して,没入型3Dビデオを生成する,ポーズフリーかつトレーニングフリーな手法を提案する。提案手法はまず,生成したモノクロ映像を推定深度情報を用いて事前定義されたカメラ視点にワープし,その後,新しい<textit{frame matrix} 塗布フレームワークを適用した。このフレームワークは、オリジナルのビデオ生成モデルを利用して、異なる視点とタイムスタンプで欠落したコンテンツを合成し、追加のモデル微調整を必要とせず、空間的および時間的整合性を確保する。さらに,潜伏空間の非閉塞領域から伝播する負の効果を緩和することにより,映像のインパインティングの質を向上する「dualupdate〜scheme」を開発した。得られたマルチビュービデオは、ステレオスコープのペアに適合するか、空間ビデオ合成のために4Dガウスに最適化される。提案手法の有効性は,Sora, Lumiere, WALT, Zeroscope など,様々な生成モデルを用いた実験により検証した。実験により,本手法は従来の手法よりも大幅に改善されていることが示された。プロジェクトページ: https://daipengwa.github.io/S-2VG_ProjectPage/

論文の概要: S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix

関連論文リスト