SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond
- URL: http://arxiv.org/abs/2502.03367v1
- Date: Wed, 05 Feb 2025 17:05:25 GMT
- Title: SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond
- Authors: Madhav R. Muthyala, Farshud Sorourifar, You Peng, Joel A. Paulson,
- Abstract summary: We introduce SyMANTIC, a novel Symbolic Regression (SR) algorithm.
SyMANTIC efficiently identifies low-dimensional descriptors from a large set of candidates.
We show that SyMANTIC uncovers similar or more accurate models at a fraction of the cost of existing SR methods.
- Score: 3.4191590966148824
- License:
- Abstract: Symbolic regression (SR) is an emerging branch of machine learning focused on discovering simple and interpretable mathematical expressions from data. Although a wide-variety of SR methods have been developed, they often face challenges such as high computational cost, poor scalability with respect to the number of input dimensions, fragility to noise, and an inability to balance accuracy and complexity. This work introduces SyMANTIC, a novel SR algorithm that addresses these challenges. SyMANTIC efficiently identifies (potentially several) low-dimensional descriptors from a large set of candidates (from $\sim 10^5$ to $\sim 10^{10}$ or more) through a unique combination of mutual information-based feature selection, adaptive feature expansion, and recursively applied $\ell_0$-based sparse regression. In addition, it employs an information-theoretic measure to produce an approximate set of Pareto-optimal equations, each offering the best-found accuracy for a given complexity. Furthermore, our open-source implementation of SyMANTIC, built on the PyTorch ecosystem, facilitates easy installation and GPU acceleration. We demonstrate the effectiveness of SyMANTIC across a range of problems, including synthetic examples, scientific benchmarks, real-world material property predictions, and chaotic dynamical system identification from small datasets. Extensive comparisons show that SyMANTIC uncovers similar or more accurate models at a fraction of the cost of existing SR methods.
Related papers
- The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective [55.15192437680943]
We study the sample complexity of online reinforcement learning for nonlinear dynamical systems with continuous state and action spaces.
Our algorithms are likely to be useful in practice, due to their simplicity, the ability to incorporate prior knowledge, and their benign transient behavior.
arXiv Detail & Related papers (2025-01-27T10:01:28Z) - Ab initio nonparametric variable selection for scalable Symbolic Regression with large $p$ [2.222138965069487]
Symbolic regression (SR) is a powerful technique for discovering symbolic expressions that characterize nonlinear relationships in data.
Existing SR methods do not scale to datasets with a large number of input variables, which are common in modern scientific applications.
We propose PAN+SR, which combines ab initio nonparametric variable selection with SR to efficiently pre-screen large input spaces.
arXiv Detail & Related papers (2024-10-17T15:41:06Z) - TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator for Efficient and Interpretable Model Discovery [0.0]
Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models.
In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework.
We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks.
arXiv Detail & Related papers (2024-10-02T17:02:17Z) - Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems [1.204357447396532]
Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments.
We propose a sample-efficient NS method inspired by Bayesian optimization principles.
We show that BEACON comprehensively outperforms existing baselines by finding substantially larger sets of diverse behaviors under limited sampling budgets.
arXiv Detail & Related papers (2024-06-05T20:23:52Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - S$Ω$I: Score-based O-INFORMATION Estimation [7.399561232927219]
We introduce S$Omega$I, which allows for the first time to compute O-information without restrictive assumptions about the system.
Our experiments validate our approach on synthetic data, and demonstrate the effectiveness of S$Omega$I in the context of a real-world use case.
arXiv Detail & Related papers (2024-02-08T13:38:23Z) - SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression [1.0356366043809717]
We propose $ttSymbolNet$, a neural network approach to symbolic regression specifically designed as a model compression technique.
This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process.
arXiv Detail & Related papers (2024-01-18T12:51:38Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - Brain Image Synthesis with Unsupervised Multivariate Canonical
CSC$\ell_4$Net [122.8907826672382]
We propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$ell_4$Net.
arXiv Detail & Related papers (2021-03-22T05:19:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.