Incorporating Background Knowledge in Symbolic Regression using a
Computer Algebra System
- URL: http://arxiv.org/abs/2301.11919v2
- Date: Thu, 4 May 2023 14:52:27 GMT
- Title: Incorporating Background Knowledge in Symbolic Regression using a
Computer Algebra System
- Authors: Charles Fox, Neil Tran, Nikki Nacion, Samiha Sharlin, and Tyler R.
Josephson
- Abstract summary: Symbolic Regression (SR) can generate interpretable, concise expressions that fit a given dataset.
We specifically examine the addition of constraints to traditional genetic algorithm (GA) based SR (PySR) as well as a Markov-chain Monte Carlo (MCMC) based Bayesian SR architecture.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic Regression (SR) can generate interpretable, concise expressions that
fit a given dataset, allowing for more human understanding of the structure
than black-box approaches. The addition of background knowledge (in the form of
symbolic mathematical constraints) allows for the generation of expressions
that are meaningful with respect to theory while also being consistent with
data. We specifically examine the addition of constraints to traditional
genetic algorithm (GA) based SR (PySR) as well as a Markov-chain Monte Carlo
(MCMC) based Bayesian SR architecture (Bayesian Machine Scientist), and apply
these to rediscovering adsorption equations from experimental, historical
datasets. We find that, while hard constraints prevent GA and MCMC SR from
searching, soft constraints can lead to improved performance both in terms of
search effectiveness and model meaningfulness, with computational costs
increasing by about an order-of-magnitude. If the constraints do not correlate
well with the dataset or expected models, they can hinder the search of
expressions. We find Bayesian SR is better these constraints (as the Bayesian
prior) than by modifying the fitness function in the GA
Related papers
- Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients [20.941908494137806]
This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery.
Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness.
We use transformers in conjunction with breadth-first-search to improve the learning performance.
arXiv Detail & Related papers (2024-06-10T19:29:10Z) - On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - A Transformer Model for Symbolic Regression towards Scientific Discovery [11.827358526480323]
Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets.
We propose a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery.
We apply our best model to the SRSD datasets which yields state-of-the-art results using the normalized tree-based edit distance.
arXiv Detail & Related papers (2023-12-07T06:27:48Z) - RSRM: Reinforcement Symbolic Regression Machine [13.084113582897965]
We propose a novel Reinforcement Regression Machine that masters the capability of uncovering complex math equations from only scarce data.
The RSRM model is composed of three key modules: (1) a Monte Carlo tree search (MCTS) agent that explores optimal math expression trees consisting of pre-defined math operators and variables, (2) a Double Q-learning block that helps reduce the feasible search space of MCTS via properly understanding the distribution of reward, and (3) a modulated sub-tree discovery block that distills new math operators to improve representation ability of math expression trees.
arXiv Detail & Related papers (2023-05-24T02:51:45Z) - Active Learning in Symbolic Regression with Physical Constraints [0.4037357056611557]
Evolutionary symbolic regression (SR) fits a symbolic equation to data, which gives a concise interpretable model.
We explore using SR as a method to propose which data to gather in an active learning setting with physical constraints.
arXiv Detail & Related papers (2023-05-17T17:07:25Z) - Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search [29.392036559507755]
Symbolic regression is a problem of learning a symbolic expression from numerical data.
Deep neural models trained on procedurally-generated synthetic datasets showed competitive performance.
We propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure.
arXiv Detail & Related papers (2023-02-22T09:10:20Z) - Explainable Sparse Knowledge Graph Completion via High-order Graph
Reasoning Network [111.67744771462873]
This paper proposes a novel explainable model for sparse Knowledge Graphs (KGs)
It combines high-order reasoning into a graph convolutional network, namely HoGRN.
It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability.
arXiv Detail & Related papers (2022-07-14T10:16:56Z) - Multi-task Learning of Order-Consistent Causal Graphs [59.9575145128345]
We consider the problem of discovering $K related Gaussian acyclic graphs (DAGs)
Under multi-task learning setting, we propose a $l_1/l$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models.
We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order.
arXiv Detail & Related papers (2021-11-03T22:10:18Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Robust Compressed Sensing using Generative Models [98.64228459705859]
In this paper we propose an algorithm inspired by the Median-of-Means (MOM)
Our algorithm guarantees recovery for heavy-tailed data, even in the presence of outliers.
arXiv Detail & Related papers (2020-06-16T19:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.