Scaling Up Unbiased Search-based Symbolic Regression
- URL: http://arxiv.org/abs/2506.19626v1
- Date: Tue, 24 Jun 2025 13:47:19 GMT
- Title: Scaling Up Unbiased Search-based Symbolic Regression
- Authors: Paul Kahlmeyer, Joachim Giesen, Michael Habeck, Henrik Voigt,
- Abstract summary: We show that systematically searching spaces of small expressions finds solutions more accurate than those obtained by state-of-the-art symbolic regression methods.<n>In particular, systematic search outperforms state-of-the-art symbolic regressors in terms of its ability to recover the true underlying symbolic expressions.
- Score: 10.896025071832051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a regression task, a function is learned from labeled data to predict the labels at new data points. The goal is to achieve small prediction errors. In symbolic regression, the goal is more ambitious, namely, to learn an interpretable function that makes small prediction errors. This additional goal largely rules out the standard approach used in regression, that is, reducing the learning problem to learning parameters of an expansion of basis functions by optimization. Instead, symbolic regression methods search for a good solution in a space of symbolic expressions. To cope with the typically vast search space, most symbolic regression methods make implicit, or sometimes even explicit, assumptions about its structure. Here, we argue that the only obvious structure of the search space is that it contains small expressions, that is, expressions that can be decomposed into a few subexpressions. We show that systematically searching spaces of small expressions finds solutions that are more accurate and more robust against noise than those obtained by state-of-the-art symbolic regression methods. In particular, systematic search outperforms state-of-the-art symbolic regressors in terms of its ability to recover the true underlying symbolic expressions on established benchmark data sets.
Related papers
- (Exhaustive) Symbolic Regression and model selection by minimum description length [0.0]
Symbolic regression is the machine learning method for learning functions from data.<n>Traditional algorithms have an unknown (and likely significant) probability of failing to find any given good function.<n>I propose an exhaustive search and model selection by the minimum description length principle.
arXiv Detail & Related papers (2025-07-17T12:04:15Z) - Dimension Reduction for Symbolic Regression [11.391067888033765]
One measure for evaluating symbolic regression algorithms is their ability to recover formulae, up to symbolic equivalence, from finite samples.<n>We show that it reliably identifies valid substitutions and significantly boosts the performance of different types of state-of-the-art symbolic regression algorithms.
arXiv Detail & Related papers (2025-06-24T11:46:05Z) - What should an AI assessor optimise for? [57.96463917842822]
An AI assessor is an external, ideally indepen-dent system that predicts an indicator, e.g., a loss value, of another AI system.<n>Here we address the question: is it always optimal to train the assessor for the target metric?<n>We experimentally explore this question for, respectively, regression losses and classification scores with monotonic and non-monotonic mappings.
arXiv Detail & Related papers (2025-02-01T08:41:57Z) - Scalable Sparse Regression for Model Discovery: The Fast Lane to Insight [0.0]
Sparse regression applied to symbolic libraries has quickly emerged as a powerful tool for learning governing equations directly from data.
I present a general purpose, model sparse regression algorithm that extends a recently proposed exhaustive search.
It is intended to maintain agnostic sensitivity to small coefficients and be of reasonable computational cost for large symbolic libraries.
arXiv Detail & Related papers (2024-05-14T18:09:43Z) - The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version [0.0]
We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings.
This enables us to quantify the success probability of finding the best possible expressions.
We compare the search efficiency of genetic programming to random search in the space of semantically unique expressions.
arXiv Detail & Related papers (2024-04-26T09:49:32Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Interpretable Machine Learning for Science with PySR and
SymbolicRegression.jl [0.0]
PySR is an open-source library for practical symbolic regression.
It is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages.
In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science.
arXiv Detail & Related papers (2023-05-02T16:31:35Z) - Online Symbolic Regression with Informative Query [23.684346197490605]
We propose QUOSR, a framework for textbfonline textbfsymbolic textbfregression.
At each step, QUOSR receives historical data points, generates new $vx$, and then queries the symbolic expression to get the corresponding $y$.
We show that QUOSR can facilitate modern symbolic regression methods by generating informative data.
arXiv Detail & Related papers (2023-02-21T09:13:48Z) - Vector-Valued Least-Squares Regression under Output Regularity
Assumptions [73.99064151691597]
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.
We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method.
arXiv Detail & Related papers (2022-11-16T15:07:00Z) - Neural Symbolic Regression that Scales [58.45115548924735]
We introduce the first symbolic regression method that leverages large scale pre-training.
We procedurally generate an unbounded set of equations, and simultaneously pre-train a Transformer to predict the symbolic equation from a corresponding set of input-output-pairs.
arXiv Detail & Related papers (2021-06-11T14:35:22Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.