Zobrist Hash-based Duplicate Detection in Symbolic Regression
- URL: http://arxiv.org/abs/2508.13859v1
- Date: Tue, 19 Aug 2025 14:18:16 GMT
- Title: Zobrist Hash-based Duplicate Detection in Symbolic Regression
- Authors: Bogdan Burlacu,
- Abstract summary: Genetic Programming (GP) is an evolutionary search method that evolves a population of mathematical expressions through the mechanism of natural selection.<n>We show that many points in the search space are re-visited and re-evaluated multiple times by the algorithm, leading to wasted computational effort.<n>We introduce a caching mechanism based on the Zobrist hash, a type of hashing frequently used in abstract board games.
- Score: 0.5439020425819
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic regression encompasses a family of search algorithms that aim to discover the best fitting function for a set of data without requiring an a priori specification of the model structure. The most successful and commonly used technique for symbolic regression is Genetic Programming (GP), an evolutionary search method that evolves a population of mathematical expressions through the mechanism of natural selection. In this work we analyze the efficiency of the evolutionary search in GP and show that many points in the search space are re-visited and re-evaluated multiple times by the algorithm, leading to wasted computational effort. We address this issue by introducing a caching mechanism based on the Zobrist hash, a type of hashing frequently used in abstract board games for the efficient construction and subsequent update of transposition tables. We implement our caching approach using the open-source framework Operon and demonstrate its performance on a selection of real-world regression problems, where we observe up to 34\% speedups without any detrimental effects on search quality. The hashing approach represents a straightforward way to improve runtime performance while also offering some interesting possibilities for adjusting search strategy based on cached information.
Related papers
- Equality Graph Assisted Symbolic Regression [0.5156484100374058]
Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy.<n>We propose a new search algorithm for symbolic regression called SymRegg that revolves around the e-graph structure.<n>We show that SymRegg is capable of improving the efficiency of the search, maintaining consistently accurate results across different datasets.
arXiv Detail & Related papers (2025-11-02T16:57:22Z) - Improving Genetic Programming for Symbolic Regression with Equality Graphs [0.0]
We exploit the equality graph to store expressions and their equivalent forms.<n>We adapt the subtree operators to reduce the chances of revisiting expressions.<n>Results show that, for small expressions, this approach improves the performance of a simple GP algorithm to compete with PySR and Operon.
arXiv Detail & Related papers (2025-01-29T18:49:34Z) - Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version [0.0]
We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings.
This enables us to quantify the success probability of finding the best possible expressions.
We compare the search efficiency of genetic programming to random search in the space of semantically unique expressions.
arXiv Detail & Related papers (2024-04-26T09:49:32Z) - Unified Functional Hashing in Automatic Machine Learning [58.77232199682271]
We show that large efficiency gains can be obtained by employing a fast unified functional hash.
Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently.
We show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery.
arXiv Detail & Related papers (2023-02-10T18:50:37Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Improving Passage Retrieval with Zero-Shot Question Generation [109.11542468380331]
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.
The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage.
arXiv Detail & Related papers (2022-04-15T14:51:41Z) - Symbolic Regression by Exhaustive Search: Reducing the Search Space
Using Syntactical Constraints and Efficient Semantic Structure Deduplication [2.055204980188575]
Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available.
In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues.
A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions.
arXiv Detail & Related papers (2021-09-28T17:47:51Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - A Genetic Algorithm for Obtaining Memory Constrained Near-Perfect
Hashing [0.0]
We present an approach based on hash tables that focuses on both minimizing the number of comparisons performed during the search and minimizing the total collection size.
The paper results show that near-perfect hashing is faster than binary search, yet uses less memory than perfect hashing.
arXiv Detail & Related papers (2020-07-16T12:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.