Related papers: Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

URL: http://arxiv.org/abs/2508.17380v1
Date: Sun, 24 Aug 2025 14:34:21 GMT
Title: Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Authors: Jiaqi Liu, Songning Lai, Pengze Li, Di Yu, Wenjie Zhou, Yiyang Zhou, Peng Xia, Zijun Wang, Xi Chen, Shixiang Tang, Lei Bai, Wanli Ouyang, Mingyu Ding, Huaxiu Yao, Aoran Wang,
Abstract summary: VIPERR-aq1 is a multimodal model that performs Visual Induction for Equation Reasoning.<n>It integrates visual perception, trajectory data, and symbolic reasoning to emulate the scientific discovery process.<n>It consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability.
Score: 98.58830663687911
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated discovery of physical laws from observational data in the real world is a grand challenge in AI. Current methods, relying on symbolic regression or LLMs, are limited to uni-modal data and overlook the rich, visual phenomenological representations of motion that are indispensable to physicists. This "sensory deprivation" severely weakens their ability to interpret the inherent spatio-temporal patterns within dynamic phenomena. To address this gap, we propose VIPER-R1, a multimodal model that performs Visual Induction for Physics-based Equation Reasoning to discover fundamental symbolic formulas. It integrates visual perception, trajectory data, and symbolic reasoning to emulate the scientific discovery process. The model is trained via a curriculum of Motion Structure Induction (MSI), using supervised fine-tuning to interpret kinematic phase portraits and to construct hypotheses guided by a Causal Chain of Thought (C-CoT), followed by Reward-Guided Symbolic Calibration (RGSC) to refine the formula structure with reinforcement learning. During inference, the trained VIPER-R1 acts as an agent: it first posits a high-confidence symbolic ansatz, then proactively invokes an external symbolic regression tool to perform Symbolic Residual Realignment (SR^2). This final step, analogous to a physicist's perturbation analysis, reconciles the theoretical model with empirical data. To support this research, we introduce PhysSymbol, a new 5,000-instance multimodal corpus. Experiments show that VIPER-R1 consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability, enabling more precise discovery of physical laws. Project page: https://jiaaqiliu.github.io/VIPER-R1/

Related papers

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads [91.05736019384489]
We introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning.<n>Our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models.
arXiv Detail & Related papers (2026-02-10T06:28:08Z)
PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models [40.16417939211015]
Modern foundational Multimodal Large Language Models (MLLMs) and video world models have advanced significantly in mathematical, common-sense, and visual reasoning.<n>Existing benchmarks attempting to measure this matter rely on synthetic, Visual Question Answer templates or focus on perceptual video quality that is tangential to measuring how well the video abides by physical laws.<n>We introduce PhysicsMind, a unified benchmark that evaluates law-consistent reasoning and generation over three canonical principles: Center of Mass, Lever Equilibrium, and Newton's First Law.
arXiv Detail & Related papers (2026-01-22T14:33:01Z)
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models [100.65199317765608]
Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation.<n>We introduce a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces.<n>We extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning.
arXiv Detail & Related papers (2026-01-16T08:40:10Z)
ProPhy: Progressive Physical Alignment for Dynamic World Simulation [55.456455952212416]
ProPhy is a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation.<n>We show that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-05T09:39:26Z)
Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model [13.900251746035012]
We investigate the internal representations of a large physics-focused foundation model.<n>By injecting concept directions back into the model during inference, we can steer its predictions.<n>Our findings open new avenues for understanding and controlling scientific foundation models.
arXiv Detail & Related papers (2025-11-25T19:40:22Z)
Universal Physics Simulation: A Foundational Diffusion Approach [0.0]
We present the first foundational AI model for universal physics simulation that learns physical laws directly from boundary-condition data.<n>Our sketch-guided diffusion transformer approach reimagines computational physics by treating simulation as a conditional generation problem.<n>Unlike sequential time-stepping methods that accumulate errors over iterations, our approach bypasses temporal integration entirely.
arXiv Detail & Related papers (2025-07-13T18:12:34Z)
Inferring Interpretable Models of Fragmentation Functions using Symbolic Regression [10.091537548478655]
We present the first study that infers, directly from experimental data, a functional form of fragmentation functions.<n>This study represents an approach to follow in such QCD-related phenomenology studies and more generally in sciences.
arXiv Detail & Related papers (2025-01-13T08:25:14Z)
A Phenomenological AI Foundation Model for Physical Signals [1.204553980682492]
We develop and train a model on 0.59 billion samples of cross-modal sensor measurements. No prior knowledge of physical laws or inductive biases were introduced into the model. We demonstrate that a single foundation model could effectively encode and predict physical behaviors.
arXiv Detail & Related papers (2024-10-15T21:03:53Z)
PhyRecon: Physically Plausible Neural Scene Reconstruction [81.73129450090684]
We introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations. Central to this design is an efficient transformation between SDF-based implicit representations and explicit surface points. Our results also exhibit superior physical stability in physical simulators, with at least a 40% improvement across all datasets.
arXiv Detail & Related papers (2024-04-25T15:06:58Z)
Discovering Interpretable Physical Models using Symbolic Regression and Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models. DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems. We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z)
Learning Physical Dynamics with Subequivariant Graph Neural Networks [99.41677381754678]
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics. Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization. Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall.
arXiv Detail & Related papers (2022-10-13T10:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.