Related papers: Improving Genetic Programming for Symbolic Regression with Equality Graphs

Improving Genetic Programming for Symbolic Regression with Equality Graphs

URL: http://arxiv.org/abs/2501.17848v1
Date: Wed, 29 Jan 2025 18:49:34 GMT
Title: Improving Genetic Programming for Symbolic Regression with Equality Graphs
Authors: Fabricio Olivetti de Franca, Gabriel Kronberger,
Abstract summary: We exploit the equality graph to store expressions and their equivalent forms.<n>We adapt the subtree operators to reduce the chances of revisiting expressions.<n>Results show that, for small expressions, this approach improves the performance of a simple GP algorithm to compete with PySR and Operon.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The search for symbolic regression models with genetic programming (GP) has a tendency of revisiting expressions in their original or equivalent forms. Repeatedly evaluating equivalent expressions is inefficient, as it does not immediately lead to better solutions. However, evolutionary algorithms require diversity and should allow the accumulation of inactive building blocks that can play an important role at a later point. The equality graph is a data structure capable of compactly storing expressions and their equivalent forms allowing an efficient verification of whether an expression has been visited in any of their stored equivalent forms. We exploit the e-graph to adapt the subtree operators to reduce the chances of revisiting expressions. Our adaptation, called eggp, stores every visited expression in the e-graph, allowing us to filter out from the available selection of subtrees all the combinations that would create already visited expressions. Results show that, for small expressions, this approach improves the performance of a simple GP algorithm to compete with PySR and Operon without increasing computational cost. As a highlight, eggp was capable of reliably delivering short and at the same time accurate models for a selected set of benchmarks from SRBench and a set of real-world datasets.

Related papers

rEGGression: an Interactive and Agnostic Tool for the Exploration of Symbolic Regression Models [0.0]
We introduce rEGGression, a tool using e-graphs to enable the exploration of a large set of symbolic expressions.<n>The main highlight is its focus in the exploration of the building blocks found during the search that can help the experts to find insights about the studied phenomena.
arXiv Detail & Related papers (2025-01-29T18:57:44Z)
The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version [0.0]
We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings. This enables us to quantify the success probability of finding the best possible expressions. We compare the search efficiency of genetic programming to random search in the space of semantically unique expressions.
arXiv Detail & Related papers (2024-04-26T09:49:32Z)
Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing [0.7373617024876725]
Symbolic regression searches for parametric models that accurately fit a dataset, prioritizing simplicity and interpretability. Applying a fast algebraic simplification may not fully simplify the expression and exact methods can be infeasible depending on size or complexity of the expressions. We propose a novel simplification and bloat control for SR employing an efficient memoization with locality-sensitive hashing (LHS)
arXiv Detail & Related papers (2024-04-08T22:54:14Z)
Sparse Training of Discrete Diffusion Models for Graph Generation [45.103518022696996]
We introduce SparseDiff, a novel diffusion model based on the observation that almost all large graphs are sparse. By selecting a subset of edges, SparseDiff effectively leverages sparse graph representations both during the noising process and within the denoising network. Our model demonstrates state-of-the-art performance across multiple metrics on both small and large datasets.
arXiv Detail & Related papers (2023-11-03T16:50:26Z)
Compositional Generalization without Trees using Multiset Tagging and Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations. Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Efficient Generator of Mathematical Expressions for Symbolic Regression [0.0]
We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE. HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space. Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms.
arXiv Detail & Related papers (2023-02-20T10:40:29Z)
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories. We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z)
Similarity-aware Positive Instance Sampling for Graph Contrastive Pre-training [82.68805025636165]
We propose to select positive graph instances directly from existing graphs in the training set. Our selection is based on certain domain-specific pair-wise similarity measurements. Besides, we develop an adaptive node-level pre-training method to dynamically mask nodes to distribute them evenly in the graph.
arXiv Detail & Related papers (2022-06-23T20:12:51Z)
GraphQ IR: Unifying Semantic Parsing of Graph Query Language with Intermediate Representation [91.27083732371453]
We propose a unified intermediate representation (IR) for graph query languages, namely GraphQ IR. With the IR's natural-language-like representation that bridges the semantic gap and its formally defined syntax that maintains the graph structure, neural semantic parsing can more effectively convert user queries into GraphQ IR. Our approach can consistently achieve state-of-the-art performance on KQA Pro, Overnight and MetaQA.
arXiv Detail & Related papers (2022-05-24T13:59:53Z)
Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G. matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z)
Zoetrope Genetic Programming for Regression [2.642406403099596]
The Zoetrope Genetic Programming (ZGP) algorithm is based on an original representation for mathematical expressions. ZGP is validated using a large number of public domain regression datasets.
arXiv Detail & Related papers (2021-02-26T10:47:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.