Related papers: SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond

SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond

URL: http://arxiv.org/abs/2502.03367v1
Date: Wed, 05 Feb 2025 17:05:25 GMT
Title: SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond
Authors: Madhav R. Muthyala, Farshud Sorourifar, You Peng, Joel A. Paulson,
Abstract summary: We introduce SyMANTIC, a novel Symbolic Regression (SR) algorithm.<n>SyMANTIC efficiently identifies low-dimensional descriptors from a large set of candidates.<n>We show that SyMANTIC uncovers similar or more accurate models at a fraction of the cost of existing SR methods.
Score: 3.4191590966148824
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Symbolic regression (SR) is an emerging branch of machine learning focused on discovering simple and interpretable mathematical expressions from data. Although a wide-variety of SR methods have been developed, they often face challenges such as high computational cost, poor scalability with respect to the number of input dimensions, fragility to noise, and an inability to balance accuracy and complexity. This work introduces SyMANTIC, a novel SR algorithm that addresses these challenges. SyMANTIC efficiently identifies (potentially several) low-dimensional descriptors from a large set of candidates (from $\sim 10^5$ to $\sim 10^{10}$ or more) through a unique combination of mutual information-based feature selection, adaptive feature expansion, and recursively applied $\ell_0$-based sparse regression. In addition, it employs an information-theoretic measure to produce an approximate set of Pareto-optimal equations, each offering the best-found accuracy for a given complexity. Furthermore, our open-source implementation of SyMANTIC, built on the PyTorch ecosystem, facilitates easy installation and GPU acceleration. We demonstrate the effectiveness of SyMANTIC across a range of problems, including synthetic examples, scientific benchmarks, real-world material property predictions, and chaotic dynamical system identification from small datasets. Extensive comparisons show that SyMANTIC uncovers similar or more accurate models at a fraction of the cost of existing SR methods.

Related papers

The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective [55.15192437680943]
We study the sample complexity of online reinforcement learning for nonlinear dynamical systems with continuous state and action spaces.<n>Our algorithms are likely to be useful in practice, due to their simplicity, the ability to incorporate prior knowledge, and their benign transient behavior.
arXiv Detail & Related papers (2025-01-27T10:01:28Z)
Ab initio nonparametric variable selection for scalable Symbolic Regression with large $p$ [2.222138965069487]
Symbolic regression (SR) is a powerful technique for discovering symbolic expressions that characterize nonlinear relationships in data. Existing SR methods do not scale to datasets with a large number of input variables, which are common in modern scientific applications. We propose PAN+SR, which combines ab initio nonparametric variable selection with SR to efficiently pre-screen large input spaces.
arXiv Detail & Related papers (2024-10-17T15:41:06Z)
TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator for Efficient and Interpretable Model Discovery [0.0]
Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models. In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework. We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks.
arXiv Detail & Related papers (2024-10-02T17:02:17Z)
Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity. We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z)
BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems [1.204357447396532]
Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments. We propose a sample-efficient NS method inspired by Bayesian optimization principles. We show that BEACON comprehensively outperforms existing baselines by finding substantially larger sets of diverse behaviors under limited sampling budgets.
arXiv Detail & Related papers (2024-06-05T20:23:52Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
S$Ω$I: Score-based O-INFORMATION Estimation [7.399561232927219]
We introduce S$Omega$I, which allows for the first time to compute O-information without restrictive assumptions about the system. Our experiments validate our approach on synthetic data, and demonstrate the effectiveness of S$Omega$I in the context of a real-world use case.
arXiv Detail & Related papers (2024-02-08T13:38:23Z)
SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression [1.0356366043809717]
We propose $ttSymbolNet$, a neural network approach to symbolic regression specifically designed as a model compression technique. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process.
arXiv Detail & Related papers (2024-01-18T12:51:38Z)
Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data. Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables. We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z)
ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization [14.146976111782466]
We present our new approach ParFam to translate the discrete symbolic regression problem into a continuous one. In combination with a global, this approach results in a highly effective method to tackle the problem of SR. We also present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam.
arXiv Detail & Related papers (2023-10-09T09:01:25Z)
A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data [12.964942755481585]
Symbolic regression is a powerful technique for discovering the underlying mathematical expressions from observed data. Recent deep generative SR methods have shown promising results, but they face difficulties in processing high-dimensional problems. We propose DySymNet, a novel neural-guided Dynamic Symbolic Network for SR.
arXiv Detail & Related papers (2023-09-24T17:37:45Z)
Inverting brain grey matter models with likelihood-free inference: a tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI. We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells. We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z)
Brain Image Synthesis with Unsupervised Multivariate Canonical CSC$\ell_4$Net [122.8907826672382]
We propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$ell_4$Net.
arXiv Detail & Related papers (2021-03-22T05:19:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.