Fugu-MT 論文翻訳(概要): Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning

論文の概要: Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning

arxiv url: http://arxiv.org/abs/2603.19607v1
Date: Fri, 20 Mar 2026 03:25:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 19:48:38.96728
Title: Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning
Title（参考訳）: Physion-Eval:人間推論による生成ビデオにおける物理リアリズムの評価
Authors: Qin Zhang, Peiyu Jing, Hong-Xing Yu, Fangqiang Ding, Fan Nie, Weimin Wang, Yilun Du, James Zou, Jiajun Wu, Bing Shuai,
Abstract要約: ビデオ生成モデルは、ストーリーテリング、シミュレーション、エンボディドAIのための世界シミュレータとして、ますます使われている。既存の評価は、自動化されたメトリクスや、好みやルーリックベースのチェックのような粗い人間の判断に大きく依存している。 5つの最先端モデルによって生成されたビデオにおいて、物理的リアリズムの失敗を診断するための専門家による推論のベンチマークであるPhyllon-Evalを紹介する。
参考スコア（独自算出の注目度）: 77.34919361116037
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video generation models are increasingly used as world simulators for storytelling, simulation, and embodied AI. As these models advance, a key question arises: do generated videos obey the physical laws of the real world? Existing evaluations largely rely on automated metrics or coarse human judgments such as preferences or rubric-based checks. While useful for assessing perceptual quality, these methods provide limited insight into when and why generated dynamics violate real-world physical constraints. We introduce Physion-Eval, a large-scale benchmark of expert human reasoning for diagnosing physical realism failures in videos generated by five state-of-the-art models across egocentric and exocentric views, containing 10,990 expert reasoning traces spanning 22 fine-grained physical categories. Each generated video is derived from a corresponding real-world reference video depicting a clear physical process, and annotated with temporally localized glitches, structured failure categories, and natural-language explanations of the violated physical behavior. Using this dataset, we reveal a striking limitation of current video generation models: in physics-critical scenarios, 83.3% of exocentric and 93.5% of egocentric generated videos exhibit at least one human-identifiable physical glitch. We hope Physion-Eval will set a new standard for physical realism evaluation and guide the development of physics-grounded video generation. The benchmark is publicly available at https://huggingface.co/datasets/PhysionLabs/Physion-Eval.
Abstract（参考訳）: ビデオ生成モデルは、ストーリーテリング、シミュレーション、エンボディドAIのための世界シミュレータとして、ますます使われている。生成されたビデオは現実世界の物理法則に従うのか? 既存の評価は、自動化されたメトリクスや、好みやルーリックベースのチェックのような粗い人間の判断に大きく依存している。知覚的品質を評価するのに有用であるが、これらの手法は、いつ、なぜ生成されたダイナミクスが現実世界の物理的制約に違反しているのかについての限られた洞察を与える。そこで我々は,エゴセントリックおよびエゴセントリックな視点で5つの最先端モデルによって生成されたビデオにおいて,物理的リアリズムの失敗を診断する専門家の推論の大規模ベンチマークであるPhyllon-Evalを紹介した。生成された各ビデオは、明確な物理過程を描写した対応する実世界の参照ビデオから派生し、時間的局所化グリッチ、構造化された障害カテゴリ、および違反した物理行動の自然言語による説明で注釈付けされる。物理クリティカルなシナリオでは、エゴセントリックな生成ビデオの83.3%、エゴセントリックな生成ビデオの93.5%は、少なくとも1つの人間の識別可能な物理的不具合を示す。 physion-Evalは、物理リアリズム評価の新しい標準を策定し、物理地上ビデオ生成の発展を導くことを願っている。ベンチマークはhttps://huggingface.co/datasets/PhysionLabs/Physion-Evalで公開されている。

論文の概要: Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning

関連論文リスト