(Exhaustive) Symbolic Regression and model selection by minimum description length
- URL: http://arxiv.org/abs/2507.13033v1
- Date: Thu, 17 Jul 2025 12:04:15 GMT
- Title: (Exhaustive) Symbolic Regression and model selection by minimum description length
- Authors: Harry Desmond,
- Abstract summary: Symbolic regression is the machine learning method for learning functions from data.<n>Traditional algorithms have an unknown (and likely significant) probability of failing to find any given good function.<n>I propose an exhaustive search and model selection by the minimum description length principle.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic regression is the machine learning method for learning functions from data. After a brief overview of the symbolic regression landscape, I will describe the two main challenges that traditional algorithms face: they have an unknown (and likely significant) probability of failing to find any given good function, and they suffer from ambiguity and poorly-justified assumptions in their function-selection procedure. To address these I propose an exhaustive search and model selection by the minimum description length principle, which allows accuracy and complexity to be directly traded off by measuring each in units of information. I showcase the resulting publicly available Exhaustive Symbolic Regression algorithm on three open problems in astrophysics: the expansion history of the universe, the effective behaviour of gravity in galaxies and the potential of the inflaton field. In each case the algorithm identifies many functions superior to the literature standards. This general purpose methodology should find widespread utility in science and beyond.
Related papers
- Scaling Up Unbiased Search-based Symbolic Regression [10.896025071832051]
We show that systematically searching spaces of small expressions finds solutions more accurate than those obtained by state-of-the-art symbolic regression methods.<n>In particular, systematic search outperforms state-of-the-art symbolic regressors in terms of its ability to recover the true underlying symbolic expressions.
arXiv Detail & Related papers (2025-06-24T13:47:19Z) - Discovering physical laws with parallel combinatorial tree search [57.05912962368898]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.<n>Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade.<n>We introduce a parallel tree search (PCTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs [23.584313644411967]
We study the problem of discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant.
Our results include theory in the deterministic dynamics setting as well as counter-examples for alternative intuitive algorithms.
We show that these can be a double-edged sword: making the algorithms more successful when used correctly and causing dramatic failure when used incorrectly.
arXiv Detail & Related papers (2024-04-22T19:46:16Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Using machine learning to find exact analytic solutions to analytically posed physics problems [0.0]
We investigate the use of machine learning for solving analytic problems in theoretical physics.
In particular, symbolic regression (SR) is making rapid progress in recent years as a tool to fit data using functions whose overall form is not known in advance.
We use a state-of-the-art SR package to demonstrate how an exact solution can be found and make an attempt at solving an unsolved physics problem.
arXiv Detail & Related papers (2023-06-05T01:31:03Z) - Generalization on the Unseen, Logic Reasoning and Degree Curriculum [25.7378861650474]
This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting.
We study how different network architectures trained by (S)GD perform under GOTU.
More specifically, this means an interpolator of the training data that has minimal Fourier mass on the higher degree basis elements.
arXiv Detail & Related papers (2023-01-30T17:44:05Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Exhaustive Symbolic Regression [0.0]
Exhaustive Symbolic Regression (ESR) is a rigorous method for combining preferences into a single objective.
We apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate.
We make our code and full equation sets publicly available.
arXiv Detail & Related papers (2022-11-21T13:48:52Z) - Symbolic Regression for Space Applications: Differentiable Cartesian
Genetic Programming Powered by Multi-objective Memetic Algorithms [10.191757341020216]
We propose a new multi-objective memetic algorithm that exploits a differentiable Cartesian Genetic Programming encoding to learn constants during evolutionary loops.
We show that this approach is competitive or outperforms machine learned black box regression models or hand-engineered fits for two applications from space: the Mars express thermal power estimation and the determination of the age of stars by gyrochronology.
arXiv Detail & Related papers (2022-06-13T14:44:15Z) - Learning outside the Black-Box: The pursuit of interpretable models [78.32475359554395]
This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function.
Our interpretation represents a leap forward from the previous state of the art.
arXiv Detail & Related papers (2020-11-17T12:39:44Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.