EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
- URL: http://arxiv.org/abs/2511.05849v1
- Date: Sat, 08 Nov 2025 04:39:11 GMT
- Title: EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
- Authors: Nan Jiang, Ziyi Wang, Yexiang Xue,
- Abstract summary: We introduce EGG-SR, a unified framework that integrates equality graphs into diverse symbolic regression algorithms.<n>EGG-SR compactly represents equivalent expressions through the proposed EGG module.<n>EGG-SR consistently enhances multiple baselines across challenging benchmarks.
- Score: 22.0886196410259
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the exponential growth of the search space of expression renders the task computationally challenging. A promising yet underexplored direction for reducing the effective search space and accelerating training lies in symbolic equivalence: many expressions, although syntactically different, define the same function -- for example, $\log(x_1^2x_2^3)$, $\log(x_1^2)+\log(x_2^3)$, and $2\log(x_1)+3\log(x_2)$. Existing algorithms treat such variants as distinct outputs, leading to redundant exploration and slow learning. We introduce EGG-SR, a unified framework that integrates equality graphs (e-graphs) into diverse symbolic regression algorithms, including Monte Carlo Tree Search (MCTS), deep reinforcement learning (DRL), and large language models (LLMs). EGG-SR compactly represents equivalent expressions through the proposed EGG module, enabling more efficient learning by: (1) pruning redundant subtree exploration in EGG-MCTS, (2) aggregating rewards across equivalence classes in EGG-DRL, and (3) enriching feedback prompts in EGG-LLM. Under mild assumptions, we show that embedding e-graphs tightens the regret bound of MCTS and reduces the variance of the DRL gradient estimator. Empirically, EGG-SR consistently enhances multiple baselines across challenging benchmarks, discovering equations with lower normalized mean squared error than state-of-the-art methods. Code implementation is available at: https://www.github.com/jiangnanhugo/egg-sr.
Related papers
- GENSR: Symbolic Regression Based in Equation Generative Space [15.186848349610363]
GenSR is a generative latent space-based SR framework.<n>From a Bayesian perspective, GenSR reframes the SR task as maximizing the conditional distribution $p(mathrmEqu. mid mathrmNum.)$.
arXiv Detail & Related papers (2026-02-24T05:14:34Z) - Equality Graph Assisted Symbolic Regression [0.5156484100374058]
Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy.<n>We propose a new search algorithm for symbolic regression called SymRegg that revolves around the e-graph structure.<n>We show that SymRegg is capable of improving the efficiency of the search, maintaining consistently accurate results across different datasets.
arXiv Detail & Related papers (2025-11-02T16:57:22Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [79.75818239774952]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z) - Compile Scene Graphs with Reinforcement Learning [69.36723767339001]
Next-token prediction is the fundamental principle for training large language models (LLMs)<n>We introduce R1-SGG, a multimodal LLM (M-LLM) initially trained via supervised fine-tuning (SFT) on the scene graph dataset.<n>We design a set of graph-centric rewards, including three recall-based variants -- Hard Recall, Hard Recall+Relax, and Soft Recall.
arXiv Detail & Related papers (2025-04-18T10:46:22Z) - Learning sparse generalized linear models with binary outcomes via iterative hard thresholding [20.28503550819373]
In statistics, generalized linear models (GLMs) are widely used for modeling data.<n>In this work, we propose to use and analyze an iterative hard thresholding (projected gradient descent on the ReLU loss) algorithm, called binary iterative hard thresholding (BIHT)<n>We establish that BIHT is statistically efficient and converges to the correct solution for parameter estimation in a general class of sparse binary GLMs.
arXiv Detail & Related papers (2025-02-25T17:42:33Z) - Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)<n>Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)<n>We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring.<n>We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z) - Improving Genetic Programming for Symbolic Regression with Equality Graphs [0.0]
We exploit the equality graph to store expressions and their equivalent forms.<n>We adapt the subtree operators to reduce the chances of revisiting expressions.<n>Results show that, for small expressions, this approach improves the performance of a simple GP algorithm to compete with PySR and Operon.
arXiv Detail & Related papers (2025-01-29T18:49:34Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Efficient Generator of Mathematical Expressions for Symbolic Regression [0.0]
We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE.
HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space.
Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms.
arXiv Detail & Related papers (2023-02-20T10:40:29Z) - RU-Net: Regularized Unrolling Network for Scene Graph Generation [92.95032610978511]
Scene graph generation (SGG) aims to detect objects and predict the relationships between each pair of objects.
Existing SGG methods usually suffer from several issues, including 1) ambiguous object representations, and 2) low diversity in relationship predictions.
We propose a regularized unrolling network (RU-Net) to address both problems.
arXiv Detail & Related papers (2022-05-03T04:21:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.