Think like a Scientist: Physics-guided LLM Agent for Equation Discovery
- URL: http://arxiv.org/abs/2602.12259v1
- Date: Thu, 12 Feb 2026 18:49:27 GMT
- Title: Think like a Scientist: Physics-guided LLM Agent for Equation Discovery
- Authors: Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad, Sharvaree Vadgama, Rose Yu,
- Abstract summary: Large language models (LLMs) have emerged as promising tools for symbolic equation discovery.<n>We introduce KeplerAgent, an agentic framework that explicitly follows this scientific reasoning process.<n>KeplerAgent achieves substantially higher symbolic accuracy and greater robustness to noisy data than both LLM and traditional baselines.
- Score: 22.586956876641406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. Recently, large language models (LLMs) have emerged as promising tools for symbolic equation discovery, owing to their broad domain knowledge and strong reasoning capabilities. However, most existing LLM-based systems try to guess equations directly from data, without modeling the multi-step reasoning process that scientists often follow: first inferring physical properties such as symmetries, then using these as priors to restrict the space of candidate equations. We introduce KeplerAgent, an agentic framework that explicitly follows this scientific reasoning process. The agent coordinates physics-based tools to extract intermediate structure and uses these results to configure symbolic regression engines such as PySINDy and PySR, including their function libraries and structural constraints. Across a suite of physical equation benchmarks, KeplerAgent achieves substantially higher symbolic accuracy and greater robustness to noisy data than both LLM and traditional baselines.
Related papers
- DISCOVER: A Physics-Informed, GPU-Accelerated Symbolic Regression Framework [0.0]
Symbolic Regression (SR) enables the discovery of interpretable mathematical relationships from experimental and simulation data.<n>This paper introduces DISCOVER, an open-source symbolic regression package developed to address these challenges through a modular, physics-motivated design.<n>The software is intended for applications in computational physics, computational chemistry, and materials science, where interpretability, physical consistency, and execution time are important.
arXiv Detail & Related papers (2026-01-27T16:33:35Z) - An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - SR-Scientist: Scientific Equation Discovery With Agentic AI [27.014966811260212]
We present SR-Scientist, a framework that implements the Large Language Models (LLMs) from a simple equation proposer to an autonomous AI scientist.<n>Specifically, we wrap the code interpreter into a set of tools for data analysis and equation evaluation.<n> Empirical results show that SR-Scientist outperforms baseline methods by an absolute margin of 6% to 35% on datasets.
arXiv Detail & Related papers (2025-10-13T17:35:23Z) - SciML Agents: Write the Solver, Not the Solution [69.5021018644143]
We introduce two new datasets: a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark of 1,000 diverse ODE tasks.<n>We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants.<n>Preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.
arXiv Detail & Related papers (2025-09-12T02:53:57Z) - Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery [98.58830663687911]
VIPERR-aq1 is a multimodal model that performs Visual Induction for Equation Reasoning.<n>It integrates visual perception, trajectory data, and symbolic reasoning to emulate the scientific discovery process.<n>It consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability.
arXiv Detail & Related papers (2025-08-24T14:34:21Z) - DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience [14.093206703519103]
DrSR is a framework that combines data-driven insight with reflective learning to enhance both robustness and discovery capability.<n> Experiments across interdisciplinary datasets in physics, chemistry, biology, and materials science demonstrate that DrSR substantially improves the valid equation rate.
arXiv Detail & Related papers (2025-06-04T04:52:34Z) - MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data [22.262191225577244]
We explore whether a similar approach can be applied to scientific foundation models (SFMs)
We collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries.
We provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data.
arXiv Detail & Related papers (2024-10-09T00:52:00Z) - Symbolic Learning Enables Self-Evolving Agents [55.625275970720374]
We introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own.
Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning.
We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks.
arXiv Detail & Related papers (2024-06-26T17:59:18Z) - LLM-SR: Scientific Equation Discovery via Programming with Large Language Models [17.64574496035502]
Current methods of equation discovery, commonly known as symbolic regression, largely focus on extracting equations from data alone.<n>We introduce LLM-SR, a novel approach that leverages the scientific knowledge and robust code generation capabilities of Large Language Models.<n>We show that LLM-SR discovers physically accurate equations that significantly outperform state-of-the-art symbolic regression baselines.
arXiv Detail & Related papers (2024-04-29T03:30:06Z) - SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models [57.96527452844273]
We introduce SciInstruct, a suite of scientific instructions for training scientific language models capable of college-level scientific reasoning.
We curated a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs.
To verify the effectiveness of SciInstruct, we fine-tuned different language models with SciInstruct, i.e., ChatGLM3 (6B and 32B), Llama3-8B-Instruct, and Mistral-7B: MetaMath.
arXiv Detail & Related papers (2024-01-15T20:22:21Z) - SimLM: Can Language Models Infer Parameters of Physical Systems? [56.38608628187024]
We investigate the performance of Large Language Models (LLMs) at performing parameter inference in the context of physical systems.
Our experiments suggest that they are not inherently suited to this task, even for simple systems.
We propose a promising direction of exploration, which involves the use of physical simulators to augment the context of LLMs.
arXiv Detail & Related papers (2023-12-21T12:05:19Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.