Related papers: Physics simulation capabilities of LLMs

Physics simulation capabilities of LLMs

URL: http://arxiv.org/abs/2312.02091v2
Date: Mon, 2 Sep 2024 10:02:51 GMT
Title: Physics simulation capabilities of LLMs
Authors: Mohamad Ali-Dib, Kristen Menou,
Abstract summary: Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding. We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: [Abridged abstract] Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding. Combining these two capabilities could one day enable AI systems to simulate and predict the physical world. We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems. We condition LLM generation on the use of well-documented and widely-used packages to elicit coding capabilities in the physics and astrophysics domains. We contribute $\sim 50$ original and challenging problems in celestial mechanics (with REBOUND), stellar physics (with MESA), 1D fluid dynamics (with Dedalus) and non-linear dynamics (with SciPy). Since our problems do not admit unique solutions, we evaluate LLM performance on several soft metrics: counts of lines that contain different types of errors (coding, physics, necessity and sufficiency) as well as a more "educational" Pass-Fail metric focused on capturing the salient physical ingredients of the problem at hand. As expected, today's SOTA LLM (GPT4) zero-shot fails most of our problems, although about 40\% of the solutions could plausibly get a passing grade. About $70-90 \%$ of the code lines produced are necessary, sufficient and correct (coding \& physics). Physics and coding errors are the most common, with some unnecessary or insufficient lines. We observe significant variations across problem class and difficulty. We identify several failure modes of GPT4 in the computational physics domain. Our reconnaissance work provides a snapshot of current computational capabilities in classical physics and points to obvious improvement targets if AI systems are ever to reach a basic level of autonomy in physics simulation capabilities.

Related papers

PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems [3.0901186959880977]
We evaluate the performance of frontier LLMs in solving physics problems, both mathematical and descriptive.<n>We introduce a new evaluation benchmark for physics problems, $rm Psmall HYSICSEsmall VAL$, consisting of 19,609 problems sourced from various physics textbooks.
arXiv Detail & Related papers (2025-07-31T18:12:51Z)
ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems [21.278539804482012]
Large Language Models (LLMs) have shown impressive performance in domains such as mathematics and programming.<n>Physics poses unique challenges that demand not only precise computation but also deep conceptual understanding and physical modeling skills.<n>Existing benchmarks often fall short due to limited difficulty, multiple-choice formats, and static evaluation settings.
arXiv Detail & Related papers (2025-07-07T08:43:56Z)
PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models [69.73115077227969]
We present PhysUniBench, a large-scale benchmark designed to evaluate and improve the reasoning capabilities of large language models (MLLMs)<n>PhysUniBench consists of 3,304 physics questions spanning 8 major sub-disciplines of physics, each accompanied by one visual diagram.<n>The benchmark's construction involved a rigorous multi-stage process, including multiple roll-outs, expert-level evaluation, automated filtering of easily solved problems, and a nuanced difficulty grading system with five levels.
arXiv Detail & Related papers (2025-06-21T09:55:42Z)
Can Theoretical Physics Research Benefit from Language Agents? [50.57057488167844]
Large Language Models (LLMs) are rapidly advancing across diverse domains, yet their application in theoretical physics research is not yet mature.<n>This position paper argues that LLM agents can potentially help accelerate theoretical, computational, and applied physics when properly integrated with domain knowledge and toolbox.<n>We envision future physics-specialized LLMs that could handle multimodal data, propose testable hypotheses, and design experiments.
arXiv Detail & Related papers (2025-06-06T16:20:06Z)
Scaling Physical Reasoning with the PHYSICS Dataset [32.956687630330116]
PHYSICS is a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels.<n>It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics.<n>It also spans a wide range of difficulty levels, from high school to graduate-level physics courses.
arXiv Detail & Related papers (2025-05-21T17:06:28Z)
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability [8.441945838936444]
We present FEABench, a benchmark to evaluate the ability of large language models (LLMs) and LLM agents to simulate and solve physics, mathematics and engineering problems using finite element analysis (FEA) We introduce a comprehensive evaluation scheme to investigate the ability of LLMs to solve these problems end-to-end by reasoning over natural language problem descriptions and operating COMSOL Multiphysics$circledR$, an FEA software, to compute the answers.
arXiv Detail & Related papers (2025-04-08T17:59:39Z)
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation [90.00687889213991]
Solving complex long-horizon robotic manipulation problems requires sophisticated high-level planning capabilities. Vision-language models (VLMs) pretrained on Internet data could in principle offer a framework for tackling such problems. In this paper, we introduce a novel test-time framework that enhancesVLMs' physical reasoning capabilities for multi-stage manipulation tasks.
arXiv Detail & Related papers (2025-02-23T20:42:15Z)
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models [39.917074900737575]
Large language models (LLMs) have demonstrated remarkable capabilities in solving complex reasoning tasks. The domain of physics reasoning presents unique challenges that have received significantly less attention. Existing benchmarks often fall short in evaluating LLMs' abilities on the breadth and depth of undergraduate-level physics.
arXiv Detail & Related papers (2025-02-01T06:42:02Z)
Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback [33.000541253136745]
Large Language Models (LLMs) have demonstrated strong capabilities in text-based tasks but struggle with the complex reasoning required for physics problems. This paper presents a novel approach to improving LLM performance on physics questions using Reinforcement Learning with Human and Artificial Intelligence Feedback (RLHAIF)
arXiv Detail & Related papers (2024-12-06T21:17:47Z)
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering [32.87943023416162]
We propose an LMM-based model to answer multimodal physics MCQs. For domain adaptation, we utilize the MM-PhyQA dataset comprising Indian high school-level multimodal physics problems. In image captioning, we add a detailed explanation of the diagram in each image, minimizing hallucinations and image processing errors.
arXiv Detail & Related papers (2024-04-19T14:52:57Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
Building Flexible Machine Learning Models for Scientific Computing at Scale [35.41293100957156]
We present OmniArch, the first prototype aiming at solving multi-scale and multi-physics scientific computing problems with physical alignment. As far as we know, we first conduct 1D-2D-3D united pre-training on the PDEBench, and it sets not only new performance benchmarks for 1D, 2D, and 3D PDEs but also demonstrates exceptional adaptability to new physics via in-context and zero-shot learning approaches.
arXiv Detail & Related papers (2024-02-25T07:19:01Z)
Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level [0.0]
Large language model (LLM) pre-trained on texts can not only solve pure math word problems, but also physics word problems. Our work is the first research to focus on the automatic solving, explanation, and generation of physics word problems.
arXiv Detail & Related papers (2023-09-15T06:13:06Z)
Learning Controllable Adaptive Simulation for Multi-resolution Physics [86.8993558124143]
We introduce Learning controllable Adaptive simulation for Multi-resolution Physics (LAMP) as the first full deep learning-based surrogate model. LAMP consists of a Graph Neural Network (GNN) for learning the forward evolution, and a GNN-based actor-critic for learning the policy of spatial refinement and coarsening. We demonstrate that our LAMP outperforms state-of-the-art deep learning surrogate models, and can adaptively trade-off computation to improve long-term prediction error.
arXiv Detail & Related papers (2023-05-01T23:20:27Z)
Physics Embedded Machine Learning for Electromagnetic Data Imaging [83.27424953663986]
Electromagnetic (EM) imaging is widely applied in sensing for security, biomedicine, geophysics, and various industries. It is an ill-posed inverse problem whose solution is usually computationally expensive. Machine learning (ML) techniques and especially deep learning (DL) show potential in fast and accurate imaging. This article surveys various schemes to incorporate physics in learning-based EM imaging.
arXiv Detail & Related papers (2022-07-26T02:10:15Z)
An extended physics informed neural network for preliminary analysis of parametric optimal control problems [0.0]
We propose an extension of physics informed supervised learning strategies to parametric partial differential equations. Our main goal is to provide a physics informed learning paradigm to simulate parametrized phenomena in a small amount of time.
arXiv Detail & Related papers (2021-10-26T09:39:05Z)
A Review of Physics-based Machine Learning in Civil Engineering [0.0]
Machine learning (ML) is a significant tool that can be applied across many disciplines. ML for civil engineering applications that are simulated in the lab often fail in real-world tests. This paper reviews the history of physics-based ML and its application in civil engineering.
arXiv Detail & Related papers (2021-10-09T15:50:21Z)
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics [89.81550748680245]
We introduce a new differentiable physics benchmark called PasticineLab. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark.
arXiv Detail & Related papers (2021-04-07T17:59:23Z)
Data-Efficient Learning for Complex and Real-Time Physical Problem Solving using Augmented Simulation [49.631034790080406]
We present a task for navigating a marble to the center of a circular maze. We present a model that learns to move a marble in the complex environment within minutes of interacting with the real system.
arXiv Detail & Related papers (2020-11-14T02:03:08Z)
Scalable Differentiable Physics for Learning and Control [99.4302215142673]
Differentiable physics is a powerful approach to learning and control problems that involve physical objects and environments. We develop a scalable framework for differentiable physics that can support a large number of objects and their interactions.
arXiv Detail & Related papers (2020-07-04T19:07:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.