Hierarchical Bayesian Operator-induced Symbolic Regression Trees for Structural Learning of Scientific Expressions
- URL: http://arxiv.org/abs/2509.19710v1
- Date: Wed, 24 Sep 2025 02:42:25 GMT
- Title: Hierarchical Bayesian Operator-induced Symbolic Regression Trees for Structural Learning of Scientific Expressions
- Authors: Somjit Roy, Pritam Dey, Debdeep Pati, Bani K. Mallick,
- Abstract summary: We develop a hierarchical Bayesian framework for symbolic regression that represents scientific laws as ensembles of tree-structured symbolic expressions with a regularized tree prior.<n>We establish near-minimax rate of Bayesian posterior concentration, providing the first rigorous guarantee in context of symbolic regression.<n> Empirical evaluation demonstrates robust performance of our proposed methodology against state-of-the-art competing modules.
- Score: 3.8545239266455185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of Scientific Machine Learning has heralded a transformative era in scientific discovery, driving progress across diverse domains. Central to this progress is uncovering scientific laws from experimental data through symbolic regression. However, existing approaches are dominated by heuristic algorithms or data-hungry black-box methods, which often demand low-noise settings and lack principled uncertainty quantification. Motivated by interpretable Statistical Artificial Intelligence, we develop a hierarchical Bayesian framework for symbolic regression that represents scientific laws as ensembles of tree-structured symbolic expressions endowed with a regularized tree prior. This coherent probabilistic formulation enables full posterior inference via an efficient Markov chain Monte Carlo algorithm, yielding a balance between predictive accuracy and structural parsimony. To guide symbolic model selection, we develop a marginal posterior-based criterion adhering to the Occam's window principle and further quantify structural fidelity to ground truth through a tailored expression-distance metric. On the theoretical front, we establish near-minimax rate of Bayesian posterior concentration, providing the first rigorous guarantee in context of symbolic regression. Empirical evaluation demonstrates robust performance of our proposed methodology against state-of-the-art competing modules on a simulated example, a suite of canonical Feynman equations, and single-atom catalysis dataset.
Related papers
- VaSST: Variational Inference for Symbolic Regression using Soft Symbolic Trees [2.6521352889229446]
We introduce VaSST, a scalable probabilistic framework for symbolic regression based on variational inference.<n>VaSST achieves superior performance in both structural recovery and predictive accuracy compared to state-of-the-art symbolic regression methods.
arXiv Detail & Related papers (2026-02-27T00:07:31Z) - Knowledge-Informed Kernel State Reconstruction for Interpretable Dynamical System Discovery [46.9843470803458]
MAAT (Model Aware Approximation of Trajectories) is a framework for symbolic discovery built on knowledge-informed Kernel State Reconstruction.<n>It substantially reduces state-estimation MSE for trajectories and derivatives used by downstream symbolic regression.
arXiv Detail & Related papers (2026-01-29T21:15:52Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Bayesian Symbolic Regression via Posterior Sampling [0.0]
Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application.<n>This paper introduces a Sequential Monte Carlo framework for Bayesian symbolic regression that approximates the posterior distribution over symbolic expressions.
arXiv Detail & Related papers (2025-12-11T17:38:20Z) - Interpretable Neural Approximation of Stochastic Reaction Dynamics with Guaranteed Reliability [4.736119820998459]
We introduce DeepSKA, a neural framework that achieves interpretability, guaranteed reliability, and substantial computational gains.<n>DeepSKA yields mathematically transparent representations that generalise across states, times, and output functions, and it integrates this structure with a small number of simulations to produce unbiased, provably convergent, and dramatically lower-magnitude estimates than classical Monte Carlo.
arXiv Detail & Related papers (2025-12-06T04:45:31Z) - Bayesian symbolic regression: Automated equation discovery from a physicists' perspective [0.0]
Symbolic regression automates the process of learning closed-form mathematical models from data.<n>Standard approaches to symbolic regression rely on model selection criteria, approaches regularization, and exploration of model space.<n>We show how the probabilistic approach establishes model plausibility from basic considerations and explicit approximations.
arXiv Detail & Related papers (2025-07-22T17:53:15Z) - Cycle-Consistent Helmholtz Machine: Goal-Seeded Simulation via Inverted Inference [5.234742752529437]
We introduce the emphCycle-Consistent Helmholtz Machine (C$2$HM)<n>C$2$HM reframes inference as a emphgoal-seeded, emphasymmetric process grounded in structured internal priors.<n>By offering a biologically inspired alternative to classical amortized inference, $C2$HM reconceives generative modeling as intentional simulation.
arXiv Detail & Related papers (2025-07-03T17:24:27Z) - Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees [20.285132886770146]
We introduce an equivariant representation learning framework that simultaneously addresses regression, conditional probability estimation, and uncertainty quantification.<n>Grounded in operator and group representation theory, our framework approximates the spectral decomposition of the conditional expectation operator.<n> Empirical evaluations on synthetic datasets and real-world robotics applications confirm the potential of our approach.
arXiv Detail & Related papers (2025-05-26T10:47:23Z) - Controlled Agentic Planning & Reasoning for Mechanism Synthesis [18.8323743697237]
This work presents a dual-agent acllm-based reasoning framework for automated planar mechanism synthesis.<n>From a natural-language task description, the system composes symbolic constraints and equations, generates and parametrises simulation code, and iteratively refines designs via critic-driven feedback.
arXiv Detail & Related papers (2025-05-23T08:16:32Z) - The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is a critical step in the NLP pipeline.<n>Despite its recognized importance as a standard representation method in NLP, the theoretical underpinnings of tokenization are not yet fully understood.<n>The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.
arXiv Detail & Related papers (2024-07-16T11:12:28Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling.
deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective.
This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.