Fugu-MT 論文翻訳(概要): PhyWorld: Physics-Faithful World Model for Video Generation

論文の概要: PhyWorld: Physics-Faithful World Model for Video Generation

arxiv url: http://arxiv.org/abs/2605.19242v1
Date: Tue, 19 May 2026 01:28:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.058305
Title: PhyWorld: Physics-Faithful World Model for Video Generation
Title（参考訳）: PhyWorld:物理に富むビデオ生成のための世界モデル
Authors: Pu Zhao, Juyi Lin, Timothy Rupprecht, Arash Akbari, Chence Yang, Rahul Chowdhury, Elaheh Motamedi, Arman Akbari, Yumei He, Chen Wang, Geng Yuan, Weiwei Chen, Yanzhi Wang,
Abstract要約: 本稿では,時間的コヒーレントで物理的に忠実なシーン継続を生成するビデオ生成ワールドモデルであるPhyWorldを提案する。最初の段階では、フローマッチングの微調整によりビデオ間連続性を改善し、安定した視覚特性とコヒーレントな動きのダイナミクスを奨励する。第2段階では、生成したダイナミクスを物理選好ペアに対して直接選好最適化(DPO)を用いて物理原理と整合させ、より高い物理確率で出力に向かってモデルを導く。
参考スコア（独自算出の注目度）: 30.11795285799137
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: World simulators can provide safe and scalable environments for training Physical AI systems before real-world deployment. Large video generation models are emerging as a promising basis for such simulators because they can generate diverse and realistic visual futures. However, using them as world simulators requires physically faithful video continuations, namely, generated videos that preserve the physical state implied by the conditioning input, and evolve in ways consistent with basic physical principles. We propose PhyWorld, a video generation world model designed to produce temporally coherent and physically faithful scene continuations through two-stage post-training. In the first stage, we improve video-to-video continuation with flow matching fine-tuning, encouraging stable visual attributes and coherent motion dynamics across frames. In the second stage, we align generated dynamics with physical principles using Direct Preference Optimization (DPO) over physics preference pairs, guiding the model toward outputs with higher physical plausibility. To evaluate PhyWorld, we use both standard video-quality benchmarks and a dedicated physical-faithfulness benchmark with per-law scoring. Experiments show that PhyWorld improves video consistency, achieving an average score of 0.769 on VBench compared with 0.756 or below for state-of-the-art baselines. PhyWorld also improves physical plausibility, reaching an average score of 3.09 on our physical-faithfulness benchmark compared with 2.99 for the strongest baseline. These results suggest that post-training large video generation models with continuation and physics-preference signals can make them more effective world simulators for Physical AI.
Abstract（参考訳）: 世界シミュレーターは、現実世界の展開前に物理AIシステムをトレーニングするための安全でスケーラブルな環境を提供することができる。大規模なビデオ生成モデルは、多様な現実的な視覚的未来を生成できるため、そのようなシミュレーターにとって有望な基盤として現れつつある。しかし、それらを世界シミュレータとして使用するには、物理的に忠実なビデオ継続、すなわち条件付け入力によって入力される物理的状態を保存し、基本的な物理原理と整合した方法で進化させるビデオが必要である。本稿では,2段階のポストトレーニングによる時間的コヒーレントで物理的に忠実なシーン継続を実現するための映像生成ワールドモデルであるPhyWorldを提案する。第1段階では、フローマッチングの微調整によりビデオ間連続性を改善し、安定した視覚特性とフレーム間のコヒーレントな動きのダイナミクスを奨励する。第2段階では、生成したダイナミクスを物理選好ペアに対して直接選好最適化(DPO)を用いて物理原理と整合させ、より高い物理確率で出力に向かってモデルを導く。 PhyWorldを評価するために、標準的なビデオ品質ベンチマークと、法律ごとのスコアリングを備えた専用物理忠実度ベンチマークの両方を使用します。実験によると、PhyWorldはビデオの一貫性を改善し、VBenchの平均スコアは0.769、最先端のベースラインは0.756以下である。 PhyWorldは物理的妥当性も向上し、最強のベースラインの2.99に対して、我々の物理忠実度ベンチマークの平均スコアは3.09に達した。これらの結果から,継続および物理条件信号を用いた大規模ビデオ生成モデルの訓練が,物理AIのためのより効率的な世界シミュレータを実現することが示唆された。

論文の概要: PhyWorld: Physics-Faithful World Model for Video Generation

関連論文リスト