Interpretable Machine Learning for Science with PySR and
SymbolicRegression.jl
- URL: http://arxiv.org/abs/2305.01582v3
- Date: Fri, 5 May 2023 17:44:07 GMT
- Title: Interpretable Machine Learning for Science with PySR and
SymbolicRegression.jl
- Authors: Miles Cranmer (Princeton University and Flatiron Institute)
- Abstract summary: PySR is an open-source library for practical symbolic regression.
It is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages.
In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: PySR is an open-source library for practical symbolic regression, a type of
machine learning which aims to discover human-interpretable symbolic models.
PySR was developed to democratize and popularize symbolic regression for the
sciences, and is built on a high-performance distributed back-end, a flexible
search algorithm, and interfaces with several deep learning packages. PySR's
internal search algorithm is a multi-population evolutionary algorithm, which
consists of a unique evolve-simplify-optimize loop, designed for optimization
of unknown scalar constants in newly-discovered empirical expressions. PySR's
backend is the extremely optimized Julia library SymbolicRegression.jl, which
can be used directly from Julia. It is capable of fusing user-defined operators
into SIMD kernels at runtime, performing automatic differentiation, and
distributing populations of expressions to thousands of cores across a cluster.
In describing this software, we also introduce a new benchmark,
"EmpiricalBench," to quantify the applicability of symbolic regression
algorithms in science. This benchmark measures recovery of historical empirical
equations from original and synthetic datasets.
Related papers
- Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Scalable Sparse Regression for Model Discovery: The Fast Lane to Insight [0.0]
Sparse regression applied to symbolic libraries has quickly emerged as a powerful tool for learning governing equations directly from data.
I present a general purpose, model sparse regression algorithm that extends a recently proposed exhaustive search.
It is intended to maintain agnostic sensitivity to small coefficients and be of reasonable computational cost for large symbolic libraries.
arXiv Detail & Related papers (2024-05-14T18:09:43Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - A Transformer Model for Symbolic Regression towards Scientific Discovery [11.827358526480323]
Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets.
We propose a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery.
We apply our best model to the SRSD datasets which yields state-of-the-art results using the normalized tree-based edit distance.
arXiv Detail & Related papers (2023-12-07T06:27:48Z) - Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability.
The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs)
Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z) - Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search [29.392036559507755]
Symbolic regression is a problem of learning a symbolic expression from numerical data.
Deep neural models trained on procedurally-generated synthetic datasets showed competitive performance.
We propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure.
arXiv Detail & Related papers (2023-02-22T09:10:20Z) - Efficient Generator of Mathematical Expressions for Symbolic Regression [0.0]
We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE.
HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space.
Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms.
arXiv Detail & Related papers (2023-02-20T10:40:29Z) - A Precise Performance Analysis of Support Vector Regression [105.94855998235232]
We study the hard and soft support vector regression techniques applied to a set of $n$ linear measurements.
Our results are then used to optimally tune the parameters intervening in the design of hard and soft support vector regression algorithms.
arXiv Detail & Related papers (2021-05-21T14:26:28Z) - High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs.
We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system.
We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z) - Picasso: A Sparse Learning Library for High Dimensional Data Analysis in
R and Python [77.33905890197269]
We describe a new library which implements a unified pathwise coordinate optimization for a variety of sparse learning problems.
The library is coded in R++ and has user-friendly sparse experiments.
arXiv Detail & Related papers (2020-06-27T02:39:24Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.