Fugu-MT 論文翻訳(概要): 3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation

論文の概要: 3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation

arxiv url: http://arxiv.org/abs/2605.16795v1
Date: Sat, 16 May 2026 03:56:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.025041
Title: 3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation
Title（参考訳）: 3DPhys Video: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation
Authors: Hwidong Kim, Yunho Kim, Tae-Kyun Kim,
Abstract要約: この3DPhysVideoは、単一の画像から物理的にリアルなビデオを生成する、新しいトレーニング不要のパイプラインだ。画像から映像へのフローモデル(I2V)を描画点雲で導いて360度3次元シーン形状を復元するために,新しいビューシンセサイザーとして利用する。マルチオブジェクトや流体相互作用シーンを含む多種多様な実験において,本手法は単一画像から物理的に可視なビデオへのギャップを埋めることに成功した。
参考スコア（独自算出の注目度）: 13.662206166615098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video generative models have made remarkable progress, yet they often yield visual artifacts that violate grounding in physical dynamics. Recent works such as PhysGen3D tackle single image-to-3D physics through mesh reconstruction and Physically-Based Rendering, but challenges remain in modeling fluid dynamics, multi-object interactions and photorealism. This work introduces 3DPhysVideo, a novel training-free pipeline that generates physically realistic videos from a single image. We repurpose an off-the-shelf video model for two stages. First, we use it as a novel view synthesizer to reconstruct complete 360-degree 3D scene geometry by guiding the image-to-video (I2V) flow model with rendered point clouds. Second, after applying physics solvers to this geometry, the physically simulated point cloud is used to guide the same I2V flow model to synthesize final, high-quality videos. Consistency-Guided Flow SDE, which decomposes the predicted velocity of the I2V flow model into denoising and consistency bias, enforces consistency to the conditional inputs, allowing us to effectively repurpose the model for both 3D reconstruction and simulation-guided video generation. In the diverse experiments including multi-objects, and fluid interaction scenes, our method successfully bridges the gap from single-images to physically plausible videos, while remaining efficient to run on a single consumer GPU. It outperforms state-of-the-art baselines on GPT-based scores, VideoPhy benchmark and human evaluation.
Abstract（参考訳）: ビデオ生成モデルは目覚ましい進歩を遂げているが、物理力学の基盤に反する視覚的な成果物をしばしば生み出す。 PhysGen3Dのような最近の研究は、メッシュ再構成や物理ベースのレンダリングを通じて、単一画像から3Dの物理に取り組むが、流体力学、多目的相互作用、フォトリアリズムのモデリングには課題が残る。この3DPhysVideoは、単一の画像から物理的にリアルなビデオを生成する、新しいトレーニング不要のパイプラインだ。市販のビデオモデルを2段階で再利用する。まず,画像から映像へのフローモデル(I2V)を描画点雲で導くことで,360度3次元シーン形状を復元する新しいビューシンセサイザーとして利用する。第二に、物理解法をこの幾何学に応用した後、物理シミュレーションされた点雲を用いて、同じI2Vフローモデルを用いて、最終的な高品質なビデオを合成する。 I2Vフローモデルの予測速度をデノナイズと一貫性バイアスに分解するConsistency-Guided Flow SDEは条件入力の一貫性を強制し、3次元再構成とシミュレーション誘導のビデオ生成の両方に効果的にモデルを再利用する。マルチオブジェクトや流体相互作用シーンを含む多種多様な実験において,本手法は単一イメージから物理的に可視なビデオへのギャップを埋めることに成功した。 GPTベースのスコア、VideoPhyベンチマーク、人間による評価では、最先端のベースラインを上回っている。

論文の概要: 3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation

関連論文リスト