Related papers: PAI-Bench: A Comprehensive Benchmark For Physical AI

PAI-Bench: A Comprehensive Benchmark For Physical AI

URL: http://arxiv.org/abs/2512.01989v1
Date: Mon, 01 Dec 2025 18:47:39 GMT
Title: PAI-Bench: A Comprehensive Benchmark For Physical AI
Authors: Fengzhe Zhou, Jiannan Huang, Jialuo Li, Deva Ramanan, Humphrey Shi,
Abstract summary: Video generative models often struggle to maintain physically coherent dynamics.<n>Multi-modal large language models exhibit limited performance in forecasting and causal interpretation.<n>These observations suggest that current systems are still at an early stage in handling the perceptual and predictive demands of Physical AI.
Score: 70.22914615084215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Physical AI aims to develop models that can perceive and predict real-world dynamics; yet, the extent to which current multi-modal large language models and video generative models support these abilities is insufficiently understood. We introduce Physical AI Bench (PAI-Bench), a unified and comprehensive benchmark that evaluates perception and prediction capabilities across video generation, conditional video generation, and video understanding, comprising 2,808 real-world cases with task-aligned metrics designed to capture physical plausibility and domain-specific reasoning. Our study provides a systematic assessment of recent models and shows that video generative models, despite strong visual fidelity, often struggle to maintain physically coherent dynamics, while multi-modal large language models exhibit limited performance in forecasting and causal interpretation. These observations suggest that current systems are still at an early stage in handling the perceptual and predictive demands of Physical AI. In summary, PAI-Bench establishes a realistic foundation for evaluating Physical AI and highlights key gaps that future systems must address.

Related papers

WorldBench: Disambiguating Physics for Diagnostic Evaluation of World Models [17.757245394765807]
We introduce WorldBench, a video-based benchmark specifically designed for concept-specific, disentangled evaluation.<n>WorldBench offers a more nuanced and scalable framework for rigorously evaluating the physical reasoning capabilities of video generation and world models.
arXiv Detail & Related papers (2026-01-29T05:31:02Z)
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI [57.44526951497041]
We advocate for intelligent systems that ground learning in both physical principles and embodied reasoning processes.<n>Our synthesis envisions next-generation world models capable of explaining physical phenomena and predicting future states.
arXiv Detail & Related papers (2025-10-06T16:16:03Z)
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation [54.3628937181904]
Internal world models (WMs) enable agents to understand the world's state and predict transitions.<n>Recent large Vision-Language Models (VLMs), such as OpenAI o3, GPT-4o and Gemini, exhibit potential as general-purpose WMs.
arXiv Detail & Related papers (2025-06-27T03:24:29Z)
Generative Physical AI in Vision: A Survey [78.07014292304373]
Gene Artificial Intelligence (AI) has rapidly advanced the field of computer vision by enabling machines to create and interpret visual data with unprecedented sophistication.<n>This transformation builds upon a foundation of generative models to produce realistic images, videos, and 3D/4D content.<n>As generative models evolve to increasingly integrate physical realism and dynamic simulation, their potential to function as "world simulators" expands.
arXiv Detail & Related papers (2025-01-19T03:19:47Z)
Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction [0.1534667887016089]
We propose a novel architecture that aligns learned latent representations with real-world physical quantities.<n>Three case studies demonstrate that our approach achieves physical interpretability and accurate state predictions.
arXiv Detail & Related papers (2024-12-17T12:51:24Z)
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts. However, the capacity of these models to accurately represent intuitive physics remains largely unexplored. We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z)
Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective [2.61072980439312]
Devising formalisms to develop internal world models is a critical research challenge in the domains of artificial intelligence and machine learning.<n>This thesis identifies several limitations with the prevalent use of state space models as internal world models.<n>The structure of models in formalisms facilitates exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time.<n>These formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the nature of the real world and quantify the confidence in its predictions.
arXiv Detail & Related papers (2024-04-24T12:41:04Z)
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.