SR-Scientist: Scientific Equation Discovery With Agentic AI
- URL: http://arxiv.org/abs/2510.11661v1
- Date: Mon, 13 Oct 2025 17:35:23 GMT
- Title: SR-Scientist: Scientific Equation Discovery With Agentic AI
- Authors: Shijie Xia, Yuhan Sun, Pengfei Liu,
- Abstract summary: We present SR-Scientist, a framework that implements the Large Language Models (LLMs) from a simple equation proposer to an autonomous AI scientist.<n>Specifically, we wrap the code interpreter into a set of tools for data analysis and equation evaluation.<n> Empirical results show that SR-Scientist outperforms baseline methods by an absolute margin of 6% to 35% on datasets.
- Score: 27.014966811260212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Large Language Models (LLMs) have been applied to scientific equation discovery, leveraging their embedded scientific knowledge for hypothesis generation. However, current methods typically confine LLMs to the role of an equation proposer within search algorithms like genetic programming. In this paper, we present SR-Scientist, a framework that elevates the LLM from a simple equation proposer to an autonomous AI scientist that writes code to analyze data, implements the equation as code, submits it for evaluation, and optimizes the equation based on experimental feedback. Specifically, we wrap the code interpreter into a set of tools for data analysis and equation evaluation. The agent is instructed to optimize the equation by utilizing these tools over a long horizon with minimal human-defined pipelines. Empirical results show that SR-Scientist outperforms baseline methods by an absolute margin of 6% to 35% on datasets covering four science disciplines. Additionally, we demonstrate our method's robustness to noise, the generalization of the discovered equations to out-of-domain data, and their symbolic accuracy. Furthermore, we develop an end-to-end reinforcement learning framework to enhance the agent's capabilities.
Related papers
- Think like a Scientist: Physics-guided LLM Agent for Equation Discovery [22.586956876641406]
Large language models (LLMs) have emerged as promising tools for symbolic equation discovery.<n>We introduce KeplerAgent, an agentic framework that explicitly follows this scientific reasoning process.<n>KeplerAgent achieves substantially higher symbolic accuracy and greater robustness to noisy data than both LLM and traditional baselines.
arXiv Detail & Related papers (2026-02-12T18:49:27Z) - An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization [69.36509281190662]
Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck.<n>We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design.<n>We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions.
arXiv Detail & Related papers (2025-12-02T18:42:26Z) - TusoAI: Agentic Optimization for Scientific Methods [16.268579802762247]
Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code.<n>Here, we introduce TusoAI, an agentic AI system that takes a scientific task description with an evaluation function.<n>TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis.
arXiv Detail & Related papers (2025-09-28T17:30:44Z) - SciML Agents: Write the Solver, Not the Solution [69.5021018644143]
We introduce two new datasets: a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark of 1,000 diverse ODE tasks.<n>We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants.<n>Preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.
arXiv Detail & Related papers (2025-09-12T02:53:57Z) - DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience [14.093206703519103]
DrSR is a framework that combines data-driven insight with reflective learning to enhance both robustness and discovery capability.<n> Experiments across interdisciplinary datasets in physics, chemistry, biology, and materials science demonstrate that DrSR substantially improves the valid equation rate.
arXiv Detail & Related papers (2025-06-04T04:52:34Z) - Equation discovery framework EPDE: Towards a better equation discovery [50.79602839359522]
We enhance the EPDE algorithm -- an evolutionary optimization-based discovery framework.<n>Our approach generates terms using fundamental building blocks such as elementary functions and individual differentials.<n>We validate our algorithm's noise resilience and overall performance by comparing its results with those from the state-of-the-art equation discovery framework SINDy.
arXiv Detail & Related papers (2024-12-28T15:58:44Z) - LLM-SR: Scientific Equation Discovery via Programming with Large Language Models [17.64574496035502]
Current methods of equation discovery, commonly known as symbolic regression, largely focus on extracting equations from data alone.<n>We introduce LLM-SR, a novel approach that leverages the scientific knowledge and robust code generation capabilities of Large Language Models.<n>We show that LLM-SR discovers physically accurate equations that significantly outperform state-of-the-art symbolic regression baselines.
arXiv Detail & Related papers (2024-04-29T03:30:06Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - SciMED: A Computational Framework For Physics-Informed Symbolic
Regression with Scientist-In-The-Loop [0.0]
We present a novel, open-source computational framework called Scientist-Machine Equation Detector (SciMED)
SciMED integrates scientific discipline wisdom in a scientist-in-the-loop approach with state-of-the-art symbolic regression methods.
We show that SciMED is sufficiently robust to discover the correct physically meaningful symbolic expressions from noisy data.
arXiv Detail & Related papers (2022-09-13T18:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.