Related papers: Current Challenges of Symbolic Regression: Optimization, Selection, Model Simplification, and Benchmarking

Current Challenges of Symbolic Regression: Optimization, Selection, Model Simplification, and Benchmarking

URL: http://arxiv.org/abs/2512.01682v1
Date: Mon, 01 Dec 2025 13:48:07 GMT
Title: Current Challenges of Symbolic Regression: Optimization, Selection, Model Simplification, and Benchmarking
Authors: Guilherme Seidyo Imai Aldeia,
Abstract summary: Symbolic Regression (SR) aims to discover mathematical expressions that describe the relationship between variables.<n>Current methods must be constantly re-evaluated to understand the SR landscape.<n>This thesis addresses these challenges through a sequence of studies conducted throughout the doctorate.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Symbolic Regression (SR) is a regression method that aims to discover mathematical expressions that describe the relationship between variables, and it is often implemented through Genetic Programming, a metaphor for the process of biological evolution. Its appeal lies in combining predictive accuracy with interpretable models, but its promise is limited by several long-standing challenges: parameters are difficult to optimize, the selection of solutions can affect the search, and models often grow unnecessarily complex. In addition, current methods must be constantly re-evaluated to understand the SR landscape. This thesis addresses these challenges through a sequence of studies conducted throughout the doctorate, each focusing on an important aspect of the SR search process. First, I investigate parameter optimization, obtaining insights into its role in improving predictive accuracy, albeit with trade-offs in runtime and expression size. Next, I study parent selection, exploring $ε$-lexicase to select parents more likely to generate good performing offspring. The focus then turns to simplification, where I introduce a novel method based on memoization and locality-sensitive hashing that reduces redundancy and yields simpler, more accurate models. All of these contributions are implemented into a multi-objective evolutionary SR library, which achieves Pareto-optimal performance in terms of accuracy and simplicity on benchmarks of real-world and synthetic problems, outperforming several contemporary SR approaches. The thesis concludes by proposing changes to a famous large-scale symbolic regression benchmark suite, then running the experiments to assess the symbolic regression landscape, demonstrating that a SR method with the contributions presented in this thesis achieves Pareto-optimal performance.

Related papers

Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z)
Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z)
Call for Action: towards the next generation of symbolic regression benchmark [2.7253033812941387]
Symbolic Regression is a powerful technique for discovering interpretable mathematical expressions.<n> benchmarking SR methods remains challenging due to the diversity of algorithms, datasets, and evaluation criteria.
arXiv Detail & Related papers (2025-05-06T21:02:20Z)
Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$ [2.222138965069487]
Symbolic regression (SR) is a powerful technique for discovering symbolic expressions that characterize nonlinear relationships in data.<n>Existing SR methods do not scale to datasets with a large number of input variables, which is common in modern scientific applications.<n>We propose PAN+SR, which combines ab initio nonparametric variable selection with SR to efficiently pre-screen large input spaces.
arXiv Detail & Related papers (2024-10-17T15:41:06Z)
Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data. Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables. We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z)
ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization [14.146976111782466]
We present our new approach ParFam to translate the discrete symbolic regression problem into a continuous one.<n>In combination with a global, this approach results in a highly effective method to tackle the problem of SR.<n>We also present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam.
arXiv Detail & Related papers (2023-10-09T09:01:25Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Transformer-based Planning for Symbolic Regression [18.90700817248397]
We propose TPSR, a Transformer-based Planning strategy for Symbolic Regression. Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity. Our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, Symbolic abilities, and robustness to noise.
arXiv Detail & Related papers (2023-03-13T03:29:58Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
GSR: A Generalized Symbolic Regression Approach [13.606672419862047]
Generalized Symbolic Regression presented in this paper. We show that our GSR method outperforms several state-of-the-art methods on the well-known Symbolic Regression benchmark problem sets. We highlight the strengths of GSR by introducing SymSet, a new SR benchmark set which is more challenging relative to the existing benchmarks.
arXiv Detail & Related papers (2022-05-31T07:20:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.