Improving Monte Carlo Tree Search for Symbolic Regression
- URL: http://arxiv.org/abs/2509.15929v2
- Date: Wed, 24 Sep 2025 02:41:54 GMT
- Title: Improving Monte Carlo Tree Search for Symbolic Regression
- Authors: Zhengyao Huang, Daniel Zhengyu Huang, Tiannan Xiao, Dina Ma, Zhenyu Ming, Hao Shi, Yuanhui Wen,
- Abstract summary: Symbolic regression aims to discover concise, interpretable mathematical expressions that satisfy desired objectives.<n>We propose an improved framework for symbolic regression that addresses these limitations through two key innovations.<n>Our approach achieves competitive performance with state-of-the-art libraries in terms of recovery rate, attains favorable positions on the frontier of accuracy versus model complexity.
- Score: 13.641201012951356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Symbolic regression aims to discover concise, interpretable mathematical expressions that satisfy desired objectives, such as fitting data, posing a highly combinatorial optimization problem. While genetic programming has been the dominant approach, recent efforts have explored reinforcement learning methods for improving search efficiency. Monte Carlo Tree Search (MCTS), with its ability to balance exploration and exploitation through guided search, has emerged as a promising technique for symbolic expression discovery. However, its traditional bandit strategies and sequential symbol construction often limit performance. In this work, we propose an improved MCTS framework for symbolic regression that addresses these limitations through two key innovations: (1) an extreme bandit allocation strategy tailored for identifying globally optimal expressions, with finite-time performance guarantees under polynomial reward decay assumptions; and (2) evolution-inspired state-jumping actions such as mutation and crossover, which enable non-local transitions to promising regions of the search space. These state-jumping actions also reshape the reward landscape during the search process, improving both robustness and efficiency. We conduct a thorough numerical study to the impact of these improvements and benchmark our approach against existing symbolic regression methods on a variety of datasets, including both ground-truth and black-box datasets. Our approach achieves competitive performance with state-of-the-art libraries in terms of recovery rate, attains favorable positions on the Pareto frontier of accuracy versus model complexity. Code is available at https://github.com/PKU-CMEGroup/MCTS-4-SR.
Related papers
- TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z) - Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning [14.473539776112666]
We propose a novel framework named EGRL-SR (Experience-driven Goal-conditioned Reinforcement Learning for Regression)<n>We formulate symbolic regression as a goal-conditioned reinforcement learning problem and incorporate hindsight experience replay.<n>We design an all-point satisfaction binary reward function that encourages the action-value network to focus on structural patterns rather than low-error expressions.
arXiv Detail & Related papers (2026-01-21T06:08:37Z) - Current Challenges of Symbolic Regression: Optimization, Selection, Model Simplification, and Benchmarking [0.0]
Symbolic Regression (SR) aims to discover mathematical expressions that describe the relationship between variables.<n>Current methods must be constantly re-evaluated to understand the SR landscape.<n>This thesis addresses these challenges through a sequence of studies conducted throughout the doctorate.
arXiv Detail & Related papers (2025-12-01T13:48:07Z) - Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning [3.6333725470852443]
We explore how Monte Carlo Tree Search can be repurposed to improve policy optimization in preference-based reinforcement learning.<n>We propose a staged GRPO training paradigm where completions are derived from partially revealed MCTS rollouts, introducing a novel tree-structured setting for advantage estimation.<n>Our initial results indicate that while structured advantage estimation can stabilize and better reflect reasoning quality, challenges such as advantage saturation and reward signal collapse remain.
arXiv Detail & Related papers (2025-09-11T09:18:07Z) - Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling [53.11315884128402]
Temporal Knowledge Graph (TKG) completion models traditionally assume access to the entire graph during training.<n>We present an incremental training framework specifically designed for TKGs, aiming to address entities that are either not observed during training or have sparse connections.<n>Our approach combines a model-agnostic enhancement layer with a weighted sampling strategy, that can be augmented to and improve any existing TKG completion method.
arXiv Detail & Related papers (2025-07-25T06:02:48Z) - Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.<n>By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients [20.941908494137806]
We propose a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery.<n>Our work is aligned with the popular DSR framework which focuses on learning a data-specific expression generator.
arXiv Detail & Related papers (2024-06-10T19:29:10Z) - SPO: Sequential Monte Carlo Policy Optimisation [41.52684912140086]
We introduce SPO: Sequential Monte Carlo Policy optimisation.
We show that SPO provides robust policy improvement and efficient scaling properties.
We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines.
arXiv Detail & Related papers (2024-02-12T10:32:47Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.