The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version
- URL: http://arxiv.org/abs/2404.17292v1
- Date: Fri, 26 Apr 2024 09:49:32 GMT
- Title: The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version
- Authors: Gabriel Kronberger, Fabricio Olivetti de Franca, Harry Desmond, Deaglan J. Bartlett, Lukas Kammerer,
- Abstract summary: We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings.
This enables us to quantify the success probability of finding the best possible expressions.
We compare the search efficiency of genetic programming to random search in the space of semantically unique expressions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings, allowing exhaustive enumeration of all solutions. This enables us to quantify the success probability of finding the best possible expressions, and to compare the search efficiency of genetic programming to random search in the space of semantically unique expressions. This analysis is made possible by improved algorithms for equality saturation, which we use to improve the Exhaustive Symbolic Regression algorithm; this produces the set of semantically unique expression structures, orders of magnitude smaller than the full symbolic regression search space. We compare the efficiency of random search in the set of unique expressions and genetic programming. For our experiments we use two real-world datasets where symbolic regression has been used to produce well-fitting univariate expressions: the Nikuradse dataset of flow in rough pipes and the Radial Acceleration Relation of galaxy dynamics. The results show that genetic programming in such limited settings explores only a small fraction of all unique expressions, and evaluates expressions repeatedly that are congruent to already visited expressions.
Related papers
- Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Efficient Generator of Mathematical Expressions for Symbolic Regression [0.0]
We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE.
HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space.
Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms.
arXiv Detail & Related papers (2023-02-20T10:40:29Z) - Symbolic Regression via Neural-Guided Genetic Programming Population
Seeding [6.9501458586819505]
Symbolic regression is a discrete optimization problem generally believed to be NP-hard.
Prior approaches to solving the problem include neural-guided search and genetic programming.
We propose a neural-guided component used to seed the starting population of a random restart genetic programming component.
arXiv Detail & Related papers (2021-10-29T19:26:41Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Symbolic Regression by Exhaustive Search: Reducing the Search Space
Using Syntactical Constraints and Efficient Semantic Structure Deduplication [2.055204980188575]
Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available.
In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues.
A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions.
arXiv Detail & Related papers (2021-09-28T17:47:51Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Zoetrope Genetic Programming for Regression [2.642406403099596]
The Zoetrope Genetic Programming (ZGP) algorithm is based on an original representation for mathematical expressions.
ZGP is validated using a large number of public domain regression datasets.
arXiv Detail & Related papers (2021-02-26T10:47:10Z) - A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix
Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound.
It bypasses the computation of singular values and allows fast optimization by algorithms.
It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z) - Fitness Landscape Analysis of Dimensionally-Aware Genetic Programming
Featuring Feynman Equations [8.477138002183713]
We conduct a fitness landscape analysis of dimensionallyaware genetic programming search spaces on a subset of equations from Richard Feynmans well-known lectures.
Our experiments show that the added information about the variable dimensionality can efficiently guide the search algorithm.
arXiv Detail & Related papers (2020-04-27T13:05:28Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.