LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
- URL: http://arxiv.org/abs/2404.18400v2
- Date: Sun, 2 Jun 2024 20:17:59 GMT
- Title: LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
- Authors: Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K Reddy,
- Abstract summary: Traditional methods of equation discovery, known as symbolic regression, largely focus on extracting equations from data alone.
We introduce LLM-SR, a novel approach that leverages the scientific knowledge and robust code generation capabilities of Large Language Models.
We demonstrate LLM-SR's effectiveness across three diverse scientific domains, where it discovers physically accurate equations.
- Score: 17.64574496035502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical equations have been unreasonably effective in describing complex natural phenomena across various scientific disciplines. However, discovering such insightful equations from data presents significant challenges due to the necessity of navigating extremely high-dimensional combinatorial and nonlinear hypothesis spaces. Traditional methods of equation discovery, commonly known as symbolic regression, largely focus on extracting equations from data alone, often neglecting the rich domain-specific prior knowledge that scientists typically depend on. To bridge this gap, we introduce LLM-SR, a novel approach that leverages the extensive scientific knowledge and robust code generation capabilities of Large Language Models (LLMs) to discover scientific equations from data in an efficient manner. Specifically, LLM-SR treats equations as programs with mathematical operators and combines LLMs' scientific priors with evolutionary search over equation programs. The LLM iteratively proposes new equation skeleton hypotheses, drawing from its physical understanding, which are then optimized against data to estimate skeleton parameters. We demonstrate LLM-SR's effectiveness across three diverse scientific domains, where it discovers physically accurate equations that provide significantly better fits to in-domain and out-of-domain data compared to the well-established symbolic regression baselines. Incorporating scientific prior knowledge also enables LLM-SR to search the equation space more efficiently than baselines. Code is available at: https://github.com/deep-symbolic-mathematics/LLM-SR
Related papers
- AutoTurb: Using Large Language Models for Automatic Algebraic Model Discovery of Turbulence Closure [15.905369652489505]
In this work, a novel framework using LLMs to automatically discover expressions for correcting the Reynolds stress model is proposed.
The proposed method is performed for separated flow over periodic hills at Re = 10,595.
It is demonstrated that the corrective RANS can improve the prediction for both the Reynolds stress and mean velocity fields.
arXiv Detail & Related papers (2024-10-14T16:06:35Z) - MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data [22.262191225577244]
We explore whether a similar approach can be applied to scientific foundation models (SFMs)
We collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries.
We provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data.
arXiv Detail & Related papers (2024-10-09T00:52:00Z) - Symbolic Regression with a Learned Concept Library [9.395222766576342]
We present a novel method for searching for compact programmatic hypotheses that best explain a dataset.
Our algorithm, called LaSR, uses zero-shot queries to a large language model to discover and evolve concepts.
LaSR substantially outperforms a variety of state-of-the-art SR approaches based on deep learning and evolutionary algorithms.
arXiv Detail & Related papers (2024-09-14T08:17:30Z) - Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models [13.136507215114722]
MLLM-SR is a conversational symbolic regression method that can generate expressions that meet the requirements simply by describing the requirements with natural language instructions.
We experimentally demonstrate that MLLM-SR can well understand the prior knowledge we add to the natural language instructions.
arXiv Detail & Related papers (2024-06-08T09:17:54Z) - LLM4ED: Large Language Models for Automatic Equation Discovery [0.8644909837301149]
We introduce a new framework that utilizes natural language-based prompts to guide large language models in automatically mining governing equations from data.
Specifically, we first utilize the generation capability of LLMs to generate diverse equations in string form, and then evaluate the generated equations based on observations.
Experiments are extensively conducted on both partial differential equations and ordinary differential equations.
arXiv Detail & Related papers (2024-05-13T14:03:49Z) - SciGLM: Training Scientific Language Models with Self-Reflective
Instruction Annotation and Tuning [60.14510984576027]
SciGLM is a suite of scientific language models able to conduct college-level scientific reasoning.
We apply a self-reflective instruction annotation framework to generate step-by-step reasoning for unlabelled scientific questions.
We fine-tuned the ChatGLM family of language models with SciInstruct, enhancing their scientific and mathematical reasoning capabilities.
arXiv Detail & Related papers (2024-01-15T20:22:21Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Exploring Equation as a Better Intermediate Meaning Representation for
Numerical Reasoning [53.2491163874712]
We use equations as IMRs to solve the numerical reasoning task.
We present a method called Boosting Numerical Reasontextbfing by Decomposing the Generation of Equations (Bridge)
Our method improves the performance by 2.2%, 0.9%, and 1.7% on GSM8K, SVAMP, and Algebra datasets.
arXiv Detail & Related papers (2023-08-21T09:35:33Z) - Deep learning applied to computational mechanics: A comprehensive
review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail.
Both hybrid and pure machine learning (ML) methods are discussed.
History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.