Fugu-MT 論文翻訳(概要): PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance

論文の概要: PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance

arxiv url: http://arxiv.org/abs/2603.18639v1
Date: Thu, 19 Mar 2026 09:03:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:06.048229
Title: PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance
Title（参考訳）: PhysVideo: クロスビュー幾何学誘導による物理的にプラズブルなビデオ生成
Authors: Cong Wang, Hanxin Zhu, Xiao Tang, Jiayi Luo, Xin Jin, Long Chen, Fei-Yue Wang, Zhibo Chen,
Abstract要約: 物理対応ビデオを生成するフレームワークであるPhysVideoを提案する。第一段階では、Phys4Viewは運動力学における物理的属性の影響を捉え、空間的時間的一貫性を高める。第2段階では、生成した動画をガイダンスとして使用し、制御可能なビデオ合成のための前景ダイナミクスと背景コンテキストの相互作用を学習する。
参考スコア（独自算出の注目度）: 31.104339154260312
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in video generation has led to substantial improvements in visual fidelity, yet ensuring physically consistent motion remains a fundamental challenge. Intuitively, this limitation can be attributed to the fact that real-world object motion unfolds in three-dimensional space, while video observations provide only partial, view-dependent projections of such dynamics. To address these issues, we propose PhysVideo, a two-stage framework that first generates physics-aware orthogonal foreground videos and then synthesizes full videos with background. In the first stage, Phys4View leverages physics-aware attention to capture the influence of physical attributes on motion dynamics, and enhances spatio-temporal consistency by incorporating geometry-enhanced cross-view attention and temporal attention. In the second stage, VideoSyn uses the generated foreground videos as guidance and learns the interactions between foreground dynamics and background context for controllable video synthesis. To support training, we construct PhysMV, a dataset containing 40K scenes, each consisting of four orthogonal viewpoints, resulting in a total of 160K video sequences. Extensive experiments demonstrate that PhysVideo significantly improves physical realism and spatial-temporal coherence over existing video generation methods. Home page: https://anonymous.4open.science/w/Phys4D/.
Abstract（参考訳）: 映像生成の最近の進歩は、視覚的忠実度を大幅に向上させたが、身体的に一貫した動きを確実にすることが根本的な課題である。直感的には、この制限は実世界の物体の動きが三次元空間に広がるという事実に起因し得るが、ビデオ観察はそのようなダイナミックスの部分的、ビュー依存的な射影のみを提供する。これらの問題に対処するために,PhysVideoという2段階のフレームワークを提案する。第一段階では、Phys4Viewは物理認識の注意を生かし、運動力学における物理特性の影響を捉え、幾何学的強化されたクロスビューアテンションと時間的アテンションを取り入れることで時空間一貫性を高める。第2段階では、生成した前景映像をガイダンスとして使用し、制御可能なビデオ合成のための前景ダイナミクスと背景コンテキストの相互作用を学習する。トレーニングを支援するために,4つの直交視点からなる40KシーンのデータセットであるPhysMVを構築した。大規模な実験により、PhysVideoは既存のビデオ生成方法よりも物理リアリズムと空間的時間的コヒーレンスを大幅に改善することが示された。ホームページ:https://anonymous.4open.science/w/Phys4D/。

論文の概要: PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance

関連論文リスト