Related papers: Symbolic Regression via Control Variable Genetic Programming

Symbolic Regression via Control Variable Genetic Programming

URL: http://arxiv.org/abs/2306.08057v1
Date: Thu, 25 May 2023 04:11:14 GMT
Title: Symbolic Regression via Control Variable Genetic Programming
Authors: Nan Jiang, Yexiang Xue
Abstract summary: We propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables. CVGP expedites symbolic expression discovery via customized experiment design. We show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions.
Score: 24.408477700506907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning symbolic expressions directly from experiment data is a vital step in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches are limited to learning simple expressions. Regressing expressions involving many independent variables still remain out of reach. Motivated by the control variable experiments widely utilized in science, we propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables. CVGP expedites symbolic expression discovery via customized experiment design, rather than learning from a fixed dataset collected a priori. CVGP starts by fitting simple expressions involving a small set of independent variables using genetic programming, under controlled experiments where other variables are held as constants. It then extends expressions learned in previous generations by adding new independent variables, using new control variable experiments in which these variables are allowed to vary. Theoretically, we show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions. Experimentally, CVGP outperforms several baselines in learning symbolic expressions involving multiple independent variables.

Related papers

Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z)
Large-Scale Targeted Cause Discovery with Data-Driven Learning [66.86881771339145]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations. By employing a local-inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z)
Multi-View Symbolic Regression [1.2334534968968969]
We present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously. MvSR fits the evaluated expression to each independent dataset and returns a parametric family of functions. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy.
arXiv Detail & Related papers (2024-02-06T15:53:49Z)
Data-driven path collective variables [0.0]
We propose a new method for the generation, optimization, and comparison of collective variables. The resulting collective variable is one-dimensional, interpretable, and differentiable. We demonstrate the validity of the method on two different applications.
arXiv Detail & Related papers (2023-12-21T14:07:47Z)
Vertical Symbolic Regression [18.7083987727973]
Learning symbolic expressions from experimental data is a vital step in AI-driven scientific discovery. We propose Vertical Regression (VSR) to expedite symbolic regression.
arXiv Detail & Related papers (2023-12-19T08:55:47Z)
Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z)
DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables. We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels. We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z)
Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability. The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs) Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Collective variable discovery in the age of machine learning: reality, hype and everything in between [0.0]
Molecular dynamics simulation has been routinely used to understand kinetical dynamics and molecular recognition in biomolecules. In physical chemistry, these low-dimensional variables often called collective variables. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones.
arXiv Detail & Related papers (2021-12-06T17:58:53Z)
MURAL: An Unsupervised Random Forest-Based Embedding for Electronic Health Record Data [59.26381272149325]
We present an unsupervised random forest for representing data with disparate variable types. MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random. We show that using our approach, we can visualize and classify data more accurately than competing approaches.
arXiv Detail & Related papers (2021-11-19T22:02:21Z)
Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction. We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction. Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.