PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education
- URL: http://arxiv.org/abs/2601.00943v1
- Date: Fri, 02 Jan 2026 18:42:02 GMT
- Title: PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education
- Authors: Megha Mariam K. M, Aditya Arun, Zakaria Laskar, C. V. Jawahar,
- Abstract summary: The benchmark is designed to assess how well T2V models can convey core physics concepts through visual illustrations.<n>Our aim is to systematically explore the feasibility of using T2V models to generate high-quality, curriculum-aligned educational content.
- Score: 14.810845377459833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative AI models, particularly Text-to-Video (T2V) systems, offer a promising avenue for transforming science education by automating the creation of engaging and intuitive visual explanations. In this work, we take a first step toward evaluating their potential in physics education by introducing a dedicated benchmark for explanatory video generation. The benchmark is designed to assess how well T2V models can convey core physics concepts through visual illustrations. Each physics concept in our benchmark is decomposed into granular teaching points, with each point accompanied by a carefully crafted prompt intended for visual explanation of the teaching point. T2V models are evaluated on their ability to generate accurate videos in response to these prompts. Our aim is to systematically explore the feasibility of using T2V models to generate high-quality, curriculum-aligned educational content-paving the way toward scalable, accessible, and personalized learning experiences powered by AI. Our evaluation reveals that current models produce visually coherent videos with smooth motion and minimal flickering, yet their conceptual accuracy is less reliable. Performance in areas such as mechanics, fluids, and optics is encouraging, but models struggle with electromagnetism and thermodynamics, where abstract interactions are harder to depict. These findings underscore the gap between visual quality and conceptual correctness in educational video generation. We hope this benchmark helps the community close that gap and move toward T2V systems that can deliver accurate, curriculum-aligned physics content at scale. The benchmark and accompanying codebase are publicly available at https://github.com/meghamariamkm/PhyEduVideo.
Related papers
- PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance [2.2606796828967823]
Current video generation models produce high-quality aesthetic videos but often struggle to learn representations of real-world physics dynamics.<n>We propose PhysVideoGenerator, a proof-of-concept framework that embeds a learnable physics prior to the video generation process.<n>We introduce a lightweight predictor network, PredictorP, which regresses high-level physical features extracted from a pre-trained Video Joint Embedding Predictive Architecture.
arXiv Detail & Related papers (2026-01-07T07:38:58Z) - MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis [20.319439629468263]
We study Newtonian motion-controlled text-to-video generation and evaluation, emphasizing physical precision and motion coherence.<n>We introduce MoReGen, a motion-aware, physics-grounded T2V framework that generates physically accurate videos from text prompts in the code domain.<n>Our results reveal that state-of-the-art models struggle to maintain physical validity, while MoReGen establishes a direction toward physically coherent video synthesis.
arXiv Detail & Related papers (2025-12-03T19:44:04Z) - Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement [51.54051161067026]
We propose an iterative self-refinement framework to provide physics-aware guidance for video generation.<n>We introduce a multimodal chain-of-thought (MM-CoT) process that refines prompts based on feedback from physical inconsistencies.<n>Experiments on the PhyIQ benchmark show that our method improves the Physics-IQ score from 56.31 to 62.38.
arXiv Detail & Related papers (2025-11-25T13:09:03Z) - LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference [57.086932851733145]
We introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models.<n>We benchmark intuitive physics understanding in current video diffusion models.<n> Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.
arXiv Detail & Related papers (2025-10-13T15:19:07Z) - Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models [14.187604603759784]
We present PhysVidBench, a benchmark designed to evaluate the physical reasoning capabilities of text-to-video systems.<n>For each prompt, we generate videos using diverse state-of-the-art models and adopt a three-stage evaluation pipeline.<n> PhysVidBench provides a structured, interpretable framework for assessing physical commonsense in generative video models.
arXiv Detail & Related papers (2025-07-21T17:30:46Z) - VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models [53.204403109208506]
Current text-to-video (T2V) models often struggle to generate physically plausible content.<n>We propose VideoREPA, which distills physics understanding capability from understanding foundation models into T2V models.
arXiv Detail & Related papers (2025-05-29T17:06:44Z) - Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts.
However, the capacity of these models to accurately represent intuitive physics remains largely unexplored.
We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z) - Evaluation of Text-to-Video Generation Models: A Dynamics Perspective [94.2662603491163]
Existing evaluation protocols primarily focus on temporal consistency and content continuity.
We propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models.
arXiv Detail & Related papers (2024-07-01T08:51:22Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.