Related papers: Knowledge Integration for Physics-informed Symbolic Regression Using Pre-trained Large Language Models

Knowledge Integration for Physics-informed Symbolic Regression Using Pre-trained Large Language Models

URL: http://arxiv.org/abs/2509.03036v1
Date: Wed, 03 Sep 2025 05:53:40 GMT
Title: Knowledge Integration for Physics-informed Symbolic Regression Using Pre-trained Large Language Models
Authors: Bilge Taskin, Wenxiong Xie, Teddy Lazebnik,
Abstract summary: Symbolic regression (SR) has emerged as a powerful tool for automated scientific discovery.<n>Current methods often require specialized formulations and manual feature engineering.<n>In this study, we leverage pre-trained Large Language Models (LLMs) to facilitate knowledge integration in PiSR.
Score: 4.817429789586127
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Symbolic regression (SR) has emerged as a powerful tool for automated scientific discovery, enabling the derivation of governing equations from experimental data. A growing body of work illustrates the promise of integrating domain knowledge into the SR to improve the discovered equation's generality and usefulness. Physics-informed SR (PiSR) addresses this by incorporating domain knowledge, but current methods often require specialized formulations and manual feature engineering, limiting their adaptability only to domain experts. In this study, we leverage pre-trained Large Language Models (LLMs) to facilitate knowledge integration in PiSR. By harnessing the contextual understanding of LLMs trained on vast scientific literature, we aim to automate the incorporation of domain knowledge, reducing the need for manual intervention and making the process more accessible to a broader range of scientific problems. Namely, the LLM is integrated into the SR's loss function, adding a term of the LLM's evaluation of the SR's produced equation. We extensively evaluate our method using three SR algorithms (DEAP, gplearn, and PySR) and three pre-trained LLMs (Falcon, Mistral, and LLama 2) across three physical dynamics (dropping ball, simple harmonic motion, and electromagnetic wave). The results demonstrate that LLM integration consistently improves the reconstruction of physical dynamics from data, enhancing the robustness of SR models to noise and complexity. We further explore the impact of prompt engineering, finding that more informative prompts significantly improve performance.

Related papers

Agentic-KGR: Co-evolutionary Knowledge Graph Construction through Multi-Agent Reinforcement Learning [6.665920297143511]
Agentic-KGR is a novel framework enabling co-evolution between large language models (LLMs) and knowledge graphs (KGs)<n>Our approach introduces three key innovations: (1) a dynamic schema expansion mechanism that systematically extends graph beyond pre-defined boundaries during training; (2) a retrieval-augmented memory system enabling synergistic co-evolution between model parameters and knowledge structures through continuous optimization; and (3) a learnable multi-scale prompt compression approach that preserves critical information while reducing computational complexity through adaptive sequence optimization.
arXiv Detail & Related papers (2025-10-10T09:00:07Z)
TRAIL: Joint Inference and Refinement of Knowledge Graphs with Large Language Models [5.678291291711662]
TRAIL is a novel, unified framework for Thinking, Reasoning, And Incremental Learning.<n>It couples joint inference and dynamic KG refinement with large language models.<n>Extensive experiments on multiple benchmarks demonstrate that TRAIL outperforms existing KG-augmented and retrieval-augmented LLM baselines by 3% to 13%.
arXiv Detail & Related papers (2025-08-06T14:25:05Z)
DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience [14.093206703519103]
DrSR is a framework that combines data-driven insight with reflective learning to enhance both robustness and discovery capability.<n> Experiments across interdisciplinary datasets in physics, chemistry, biology, and materials science demonstrate that DrSR substantially improves the valid equation rate.
arXiv Detail & Related papers (2025-06-04T04:52:34Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
KBM: Delineating Knowledge Boundary for Adaptive Retrieval in Large Language Models [69.99274367773997]
Large Language Models (LLMs) often struggle with dynamically changing knowledge and handling unknown static information.<n>Retrieval-Augmented Generation (RAG) is employed to tackle these challenges and has a significant impact on improving LLM performance.<n>We propose a Knowledge Boundary Model (KBM) to express the known/unknown of a given question, and to determine whether a RAG needs to be triggered.
arXiv Detail & Related papers (2024-11-09T15:12:28Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs [10.380692079063467]
We propose WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process.
arXiv Detail & Related papers (2024-08-14T15:19:16Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models [17.64574496035502]
Current methods of equation discovery, commonly known as symbolic regression, largely focus on extracting equations from data alone.<n>We introduce LLM-SR, a novel approach that leverages the scientific knowledge and robust code generation capabilities of Large Language Models.<n>We show that LLM-SR discovers physically accurate equations that significantly outperform state-of-the-art symbolic regression baselines.
arXiv Detail & Related papers (2024-04-29T03:30:06Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Self-Retrieval: End-to-End Information Retrieval with One Large Language Model [97.71181484082663]
We introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture. Self-Retrieval internalizes the retrieval corpus through self-supervised learning, transforms the retrieval process into sequential passage generation, and performs relevance assessment for reranking.
arXiv Detail & Related papers (2024-02-23T18:45:35Z)
A Closer Look at the Limitations of Instruction Tuning [52.587607091917214]
We show that Instruction Tuning (IT) fails to enhance knowledge or skills in large language models (LLMs) We also show that popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
arXiv Detail & Related papers (2024-02-03T04:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.