Related papers: Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree Search

Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree Search

URL: http://arxiv.org/abs/2401.14424v3
Date: Tue, 30 Jan 2024 09:27:21 GMT
Title: Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree Search
Authors: Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jingyi Liu, Wenqiang Li, Meilan Hao, Shu Wei, Yusong Deng
Abstract summary: We introduce SR-GPT, a novel algorithm for symbolic regression. It integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT)
Score: 13.136507215114722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Finding a concise and interpretable mathematical formula that accurately describes the relationship between each variable and the predicted value in the data is a crucial task in scientific research, as well as a significant challenge in artificial intelligence. This problem is referred to as symbolic regression, which is an NP-hard problem. In the previous year, a novel symbolic regression methodology utilizing Monte Carlo Tree Search (MCTS) was advanced, achieving state-of-the-art results on a diverse range of datasets. although this algorithm has shown considerable improvement in recovering target expressions compared to previous methods, the lack of guidance during the MCTS process severely hampers its search efficiency. Recently, some algorithms have added a pre-trained policy network to guide the search of MCTS, but the pre-trained policy network generalizes poorly. To optimize the trade-off between efficiency and versatility, we introduce SR-GPT, a novel algorithm for symbolic regression that integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT). By using GPT to guide the MCTS, the search efficiency of MCTS is significantly improved. Next, we utilize the MCTS results to further refine the GPT, enhancing its capabilities and providing more accurate guidance for the MCTS. MCTS and GPT are coupled together and optimize each other until the target expression is successfully determined. We conducted extensive evaluations of SR-GPT using 222 expressions sourced from over 10 different symbolic regression datasets. The experimental results demonstrate that SR-GPT outperforms existing state-of-the-art algorithms in accurately recovering symbolic expressions both with and without added noise.

Related papers

Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity. We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning [12.660401635672967]
Finding mathematical formulas from observational data is a major demand of scientific research. FormulaGPT achieves the state-of-the-art performance in fitting ability compared with four baselines.
arXiv Detail & Related papers (2024-04-09T14:08:47Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
Transformer-based Planning for Symbolic Regression [18.90700817248397]
We propose TPSR, a Transformer-based Planning strategy for Symbolic Regression. Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity. Our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, Symbolic abilities, and robustness to noise.
arXiv Detail & Related papers (2023-03-13T03:29:58Z)
Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search [29.392036559507755]
Symbolic regression is a problem of learning a symbolic expression from numerical data. Deep neural models trained on procedurally-generated synthetic datasets showed competitive performance. We propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure.
arXiv Detail & Related papers (2023-02-22T09:10:20Z)
Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks. Data augmentation induces these symmetries during training by applying multiple transformations to the input data. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z)
Zoetrope Genetic Programming for Regression [2.642406403099596]
The Zoetrope Genetic Programming (ZGP) algorithm is based on an original representation for mathematical expressions. ZGP is validated using a large number of public domain regression datasets.
arXiv Detail & Related papers (2021-02-26T10:47:10Z)
Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results. We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.