Symbolic Regression via Control Variable Genetic Programming
- URL: http://arxiv.org/abs/2306.08057v1
- Date: Thu, 25 May 2023 04:11:14 GMT
- Title: Symbolic Regression via Control Variable Genetic Programming
- Authors: Nan Jiang, Yexiang Xue
- Abstract summary: We propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables.
CVGP expedites symbolic expression discovery via customized experiment design.
We show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions.
- Score: 24.408477700506907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning symbolic expressions directly from experiment data is a vital step
in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches
are limited to learning simple expressions. Regressing expressions involving
many independent variables still remain out of reach. Motivated by the control
variable experiments widely utilized in science, we propose Control Variable
Genetic Programming (CVGP) for symbolic regression over many independent
variables. CVGP expedites symbolic expression discovery via customized
experiment design, rather than learning from a fixed dataset collected a
priori. CVGP starts by fitting simple expressions involving a small set of
independent variables using genetic programming, under controlled experiments
where other variables are held as constants. It then extends expressions
learned in previous generations by adding new independent variables, using new
control variable experiments in which these variables are allowed to vary.
Theoretically, we show CVGP as an incremental building approach can yield an
exponential reduction in the search space when learning a class of expressions.
Experimentally, CVGP outperforms several baselines in learning symbolic
expressions involving multiple independent variables.
Related papers
- Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - Multi-View Symbolic Regression [1.2334534968968969]
We present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously.
MvSR fits the evaluated expression to each independent dataset and returns a parametric family of functions.
We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy.
arXiv Detail & Related papers (2024-02-06T15:53:49Z) - Data-driven path collective variables [0.0]
We propose a new method for the generation, optimization, and comparison of collective variables.
The resulting collective variable is one-dimensional, interpretable, and differentiable.
We demonstrate the validity of the method on two different applications.
arXiv Detail & Related papers (2023-12-21T14:07:47Z) - Vertical Symbolic Regression [18.7083987727973]
Learning symbolic expressions from experimental data is a vital step in AI-driven scientific discovery.
We propose Vertical Regression (VSR) to expedite symbolic regression.
arXiv Detail & Related papers (2023-12-19T08:55:47Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability.
The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs)
Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Collective variable discovery in the age of machine learning: reality,
hype and everything in between [0.0]
Molecular dynamics simulation has been routinely used to understand kinetical dynamics and molecular recognition in biomolecules.
In physical chemistry, these low-dimensional variables often called collective variables.
In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones.
arXiv Detail & Related papers (2021-12-06T17:58:53Z) - MURAL: An Unsupervised Random Forest-Based Embedding for Electronic
Health Record Data [59.26381272149325]
We present an unsupervised random forest for representing data with disparate variable types.
MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random.
We show that using our approach, we can visualize and classify data more accurately than competing approaches.
arXiv Detail & Related papers (2021-11-19T22:02:21Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.