Related papers: AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity

AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity

URL: http://arxiv.org/abs/2006.10782v2
Date: Wed, 16 Dec 2020 17:58:47 GMT
Title: AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity
Authors: Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, Max Tegmark
Abstract summary: We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal. It improves on the previous state-of-the-art by typically being orders of magnitude more robust toward noise and bad data. We develop a method for discovering generalized symmetries from gradient properties of a neural network fit.
Score: 8.594811303203581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal, in the sense of having the best accuracy for a given complexity. It improves on the previous state-of-the-art by typically being orders of magnitude more robust toward noise and bad data, and also by discovering many formulas that stumped previous methods. We develop a method for discovering generalized symmetries (arbitrary modularity in the computational graph of a formula) from gradient properties of a neural network fit. We use normalizing flows to generalize our symbolic regression method to probability distributions from which we only have samples, and employ statistical hypothesis testing to accelerate robust brute-force search.

Related papers

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data [9.913418444556486]
We show how iterative methods can be used to reduce computational costs in calculating likelihoods, gradients, and predictive distributions with full-scale approximations (FSAs)<n>We introduce a novel preconditioner and show theoretically and empirically that it accelerates the conjugate gradient method's convergence speed and mitigates its sensitivity with respect to the FSA parameters.<n>In our experiments, it outperforms existing state-of-the-art preconditioners for Vecchia approximations.
arXiv Detail & Related papers (2024-05-23T12:25:22Z)
Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation [14.194212772887699]
We consider meta-learning within the framework of high-dimensional random-effects linear models. We show the precise behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task. We propose and analyze an estimator inverse random regression coefficients based on data from the training tasks.
arXiv Detail & Related papers (2024-03-27T21:18:43Z)
Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data. Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables. We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z)
Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
ResMem: Learn what you can and memorize the rest [79.19649788662511]
We propose the residual-memorization (ResMem) algorithm to augment an existing prediction model. By construction, ResMem can explicitly memorize the training labels. We show that ResMem consistently improves the test set generalization of the original prediction model.
arXiv Detail & Related papers (2023-02-03T07:12:55Z)
Rigorous dynamical mean field theory for stochastic gradient descent methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods. This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z)
A Precise Performance Analysis of Support Vector Regression [105.94855998235232]
We study the hard and soft support vector regression techniques applied to a set of $n$ linear measurements. Our results are then used to optimally tune the parameters intervening in the design of hard and soft support vector regression algorithms.
arXiv Detail & Related papers (2021-05-21T14:26:28Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Random Machines Regression Approach: an ensemble support vector regression model with free kernel choice [0.0]
In this article we propose a procedure to use the bagged-weighted support vector model to regression problems. The results exhibited a good performance of Regression Random Machines through lower generalization error without needing to choose the best kernel function during tuning process.
arXiv Detail & Related papers (2020-03-27T21:30:59Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.