VaSST: Variational Inference for Symbolic Regression using Soft Symbolic Trees
- URL: http://arxiv.org/abs/2602.23561v1
- Date: Fri, 27 Feb 2026 00:07:31 GMT
- Title: VaSST: Variational Inference for Symbolic Regression using Soft Symbolic Trees
- Authors: Somjit Roy, Pritam Dey, Bani K. Mallick,
- Abstract summary: We introduce VaSST, a scalable probabilistic framework for symbolic regression based on variational inference.<n>VaSST achieves superior performance in both structural recovery and predictive accuracy compared to state-of-the-art symbolic regression methods.
- Score: 2.6521352889229446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic regression has recently gained traction in AI-driven scientific discovery, aiming to recover explicit closed-form expressions from data that reveal underlying physical laws. Despite recent advances, existing methods remain dominated by heuristic search algorithms or data-intensive approaches that assume low-noise regimes and lack principled uncertainty quantification. Fully probabilistic formulations are scarce, and existing Markov chain Monte Carlo-based Bayesian methods often struggle to efficiently explore the highly multimodal combinatorial space of symbolic expressions. We introduce VaSST, a scalable probabilistic framework for symbolic regression based on variational inference. VaSST employs a continuous relaxation of symbolic expression trees, termed soft symbolic trees, where discrete operator and feature assignments are replaced by soft distributions over allowable components. This relaxation transforms the combinatorial search over an astronomically large symbolic space into an efficient gradient-based optimization problem while preserving a coherent probabilistic interpretation. The learned soft representations induce posterior distributions over symbolic structures, enabling principled uncertainty quantification. Across simulated experiments and Feynman Symbolic Regression Database within SRBench, VaSST achieves superior performance in both structural recovery and predictive accuracy compared to state-of-the-art symbolic regression methods.
Related papers
- Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer [65.38883376379812]
We propose the Discrete Transformer, an architecture engineered to bridge the gap between continuous representations and discrete symbolic logic.<n> Empirically, the Discrete Transformer not only achieves performance comparable to RNN-based baselines but crucially extends interpretability to continuous variable domains.
arXiv Detail & Related papers (2026-01-09T12:49:41Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Bayesian Symbolic Regression via Posterior Sampling [0.0]
Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application.<n>This paper introduces a Sequential Monte Carlo framework for Bayesian symbolic regression that approximates the posterior distribution over symbolic expressions.
arXiv Detail & Related papers (2025-12-11T17:38:20Z) - Hierarchical Bayesian Operator-induced Symbolic Regression Trees for Structural Learning of Scientific Expressions [3.8545239266455185]
We develop a hierarchical Bayesian framework for symbolic regression that represents scientific laws as ensembles of tree-structured symbolic expressions with a regularized tree prior.<n>We establish near-minimax rate of Bayesian posterior concentration, providing the first rigorous guarantee in context of symbolic regression.<n> Empirical evaluation demonstrates robust performance of our proposed methodology against state-of-the-art competing modules.
arXiv Detail & Related papers (2025-09-24T02:42:25Z) - Discovering Mathematical Equations with Diffusion Language Model [6.384075523245284]
We introduce DiffuSR, a pre-training framework for symbolic regression built upon a continuous-state diffusion language model.<n>DrouSR employs a trainable embedding layer within the diffusion process to map discrete mathematical symbols into a continuous latent space.<n>We also design an effective inference strategy to enhance the accuracy of the diffusion-based equation generator.
arXiv Detail & Related papers (2025-09-16T14:53:44Z) - Symbolic Feedforward Networks for Probabilistic Finite Automata: Exact Simulation and Learnability [0.0]
We show that probabilistic finite automata can be exactly simulated using symbolic feedforward neural networks.<n>We show that these symbolic simulators are not only expressive but learnable.
arXiv Detail & Related papers (2025-09-12T07:57:01Z) - Discovering physical laws with parallel symbolic enumeration [67.36739393470869]
We introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data.<n>Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms.<n> PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Symbolic Regression by Exhaustive Search: Reducing the Search Space
Using Syntactical Constraints and Efficient Semantic Structure Deduplication [2.055204980188575]
Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available.
In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues.
A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions.
arXiv Detail & Related papers (2021-09-28T17:47:51Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.