Vertical Symbolic Regression
- URL: http://arxiv.org/abs/2312.11955v1
- Date: Tue, 19 Dec 2023 08:55:47 GMT
- Title: Vertical Symbolic Regression
- Authors: Nan Jiang, Md Nasim, Yexiang Xue
- Abstract summary: Learning symbolic expressions from experimental data is a vital step in AI-driven scientific discovery.
We propose Vertical Regression (VSR) to expedite symbolic regression.
- Score: 18.7083987727973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automating scientific discovery has been a grand goal of Artificial
Intelligence (AI) and will bring tremendous societal impact. Learning symbolic
expressions from experimental data is a vital step in AI-driven scientific
discovery. Despite exciting progress, most endeavors have focused on the
horizontal discovery paths, i.e., they directly search for the best expression
in the full hypothesis space involving all the independent variables.
Horizontal paths are challenging due to the exponentially large hypothesis
space involving all the independent variables. We propose Vertical Symbolic
Regression (VSR) to expedite symbolic regression. The VSR starts by fitting
simple expressions involving a few independent variables under controlled
experiments where the remaining variables are held constant. It then extends
the expressions learned in previous rounds by adding new independent variables
and using new control variable experiments allowing these variables to vary.
The first few steps in vertical discovery are significantly cheaper than the
horizontal path, as their search is in reduced hypothesis spaces involving a
small set of variables. As a consequence, vertical discovery has the potential
to supercharge state-of-the-art symbolic regression approaches in handling
complex equations with many contributing factors. Theoretically, we show that
the search space of VSR can be exponentially smaller than that of horizontal
approaches when learning a class of expressions. Experimentally, VSR
outperforms several baselines in learning symbolic expressions involving many
independent variables.
Related papers
- Multi-View Symbolic Regression [1.2334534968968969]
We present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously.
MvSR fits the evaluated expression to each independent dataset and returns a parametric family of functions.
We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy.
arXiv Detail & Related papers (2024-02-06T15:53:49Z) - Vertical Symbolic Regression via Deep Policy Gradient [18.7083987727973]
We propose Vertical Symbolic Regression using Deep Policy Gradient (VSR-DPG)
Our VSR-DPG models symbolic regression as a sequential decision-making process, in which equations are built from repeated applications of grammar rules.
arXiv Detail & Related papers (2024-02-01T00:54:48Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability.
The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs)
Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Symbolic Regression via Control Variable Genetic Programming [24.408477700506907]
We propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables.
CVGP expedites symbolic expression discovery via customized experiment design.
We show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions.
arXiv Detail & Related papers (2023-05-25T04:11:14Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - Orthonormal Convolutions for the Rotation Based Iterative
Gaussianization [64.44661342486434]
This paper elaborates an extension of rotation-based iterative Gaussianization, RBIG, which makes image Gaussianization possible.
In images its application has been restricted to small image patches or isolated pixels, because rotation in RBIG is based on principal or independent component analysis.
We present the emphConvolutional RBIG: an extension that alleviates this issue by imposing that the rotation in RBIG is a convolution.
arXiv Detail & Related papers (2022-06-08T12:56:34Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Systematic Evaluation of Causal Discovery in Visual Model Based
Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure.
Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs.
In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.