Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo
Tree Search
- URL: http://arxiv.org/abs/2401.14424v3
- Date: Tue, 30 Jan 2024 09:27:21 GMT
- Title: Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo
Tree Search
- Authors: Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jingyi Liu, Wenqiang Li, Meilan
Hao, Shu Wei, Yusong Deng
- Abstract summary: We introduce SR-GPT, a novel algorithm for symbolic regression.
It integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT)
- Score: 13.136507215114722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding a concise and interpretable mathematical formula that accurately
describes the relationship between each variable and the predicted value in the
data is a crucial task in scientific research, as well as a significant
challenge in artificial intelligence. This problem is referred to as symbolic
regression, which is an NP-hard problem. In the previous year, a novel symbolic
regression methodology utilizing Monte Carlo Tree Search (MCTS) was advanced,
achieving state-of-the-art results on a diverse range of datasets. although
this algorithm has shown considerable improvement in recovering target
expressions compared to previous methods, the lack of guidance during the MCTS
process severely hampers its search efficiency. Recently, some algorithms have
added a pre-trained policy network to guide the search of MCTS, but the
pre-trained policy network generalizes poorly. To optimize the trade-off
between efficiency and versatility, we introduce SR-GPT, a novel algorithm for
symbolic regression that integrates Monte Carlo Tree Search (MCTS) with a
Generative Pre-Trained Transformer (GPT). By using GPT to guide the MCTS, the
search efficiency of MCTS is significantly improved. Next, we utilize the MCTS
results to further refine the GPT, enhancing its capabilities and providing
more accurate guidance for the MCTS. MCTS and GPT are coupled together and
optimize each other until the target expression is successfully determined. We
conducted extensive evaluations of SR-GPT using 222 expressions sourced from
over 10 different symbolic regression datasets. The experimental results
demonstrate that SR-GPT outperforms existing state-of-the-art algorithms in
accurately recovering symbolic expressions both with and without added noise.
Related papers
- LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning [12.660401635672967]
Finding mathematical formulas from observational data is a major demand of scientific research.
FormulaGPT achieves the state-of-the-art performance in fitting ability compared with four baselines.
arXiv Detail & Related papers (2024-04-09T14:08:47Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Transformer-based Planning for Symbolic Regression [18.90700817248397]
We propose TPSR, a Transformer-based Planning strategy for Symbolic Regression.
Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity.
Our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, Symbolic abilities, and robustness to noise.
arXiv Detail & Related papers (2023-03-13T03:29:58Z) - Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search [29.392036559507755]
Symbolic regression is a problem of learning a symbolic expression from numerical data.
Deep neural models trained on procedurally-generated synthetic datasets showed competitive performance.
We propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure.
arXiv Detail & Related papers (2023-02-22T09:10:20Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Zoetrope Genetic Programming for Regression [2.642406403099596]
The Zoetrope Genetic Programming (ZGP) algorithm is based on an original representation for mathematical expressions.
ZGP is validated using a large number of public domain regression datasets.
arXiv Detail & Related papers (2021-02-26T10:47:10Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.