Active Learning Improves Performance on Symbolic RegressionTasks in
StackGP
- URL: http://arxiv.org/abs/2202.04708v1
- Date: Wed, 9 Feb 2022 20:05:22 GMT
- Title: Active Learning Improves Performance on Symbolic RegressionTasks in
StackGP
- Authors: Nathan Haut, Wolfgang Banzhaf, Bill Punch
- Abstract summary: We introduce an active learning method for symbolic regression using StackGP.
We use the Feynman AI benchmark set of equations to examine the ability of our method to find appropriate models using fewer data points.
- Score: 2.7685408681770247
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper we introduce an active learning method for symbolic regression
using StackGP. The approach begins with a small number of data points for
StackGP to model. To improve the model the system incrementally adds a data
point such that the new point maximizes prediction uncertainty as measured by
the model ensemble. Symbolic regression is re-run with the larger data set.
This cycle continues until the system satisfies a termination criterion. We use
the Feynman AI benchmark set of equations to examine the ability of our method
to find appropriate models using fewer data points. The approach was found to
successfully rediscover 72 of the 100 Feynman equations using as few data
points as possible, and without use of domain expertise or data translation.
Related papers
- What should an AI assessor optimise for? [57.96463917842822]
An AI assessor is an external, ideally indepen-dent system that predicts an indicator, e.g., a loss value, of another AI system.
Here we address the question: is it always optimal to train the assessor for the target metric?
We experimentally explore this question for, respectively, regression losses and classification scores with monotonic and non-monotonic mappings.
arXiv Detail & Related papers (2025-02-01T08:41:57Z) - Global dense vector representations for words or items using shared parameter alternating Tweedie model [9.104044534664672]
We present a model for analyzing the cooccurrence count data derived from practical fields such as user-item or item-item data from online shopping platform.
Data contain important information for developing recommender systems or studying relevance of items or words from non-numerical sources.
arXiv Detail & Related papers (2024-12-31T19:49:32Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Online Symbolic Regression with Informative Query [23.684346197490605]
We propose QUOSR, a framework for textbfonline textbfsymbolic textbfregression.
At each step, QUOSR receives historical data points, generates new $vx$, and then queries the symbolic expression to get the corresponding $y$.
We show that QUOSR can facilitate modern symbolic regression methods by generating informative data.
arXiv Detail & Related papers (2023-02-21T09:13:48Z) - Mutual Information Learned Regressor: an Information-theoretic Viewpoint
of Training Regression Systems [10.314518385506007]
An existing common practice for solving regression problems is the mean square error (MSE) minimization approach.
Recently, Yi et al., proposed a mutual information based supervised learning framework where they introduced a label entropy regularization.
In this paper, we investigate the regression under the mutual information based supervised learning framework.
arXiv Detail & Related papers (2022-11-23T03:43:22Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Inference of stochastic time series with missing data [5.7656096606054374]
Inferring dynamics from time series is an important objective in data analysis.
We propose an expectation (EM) that iterates between two steps: E-step restores missing data points, while M-step infers an underlying network model.
We find that demanding equal consistency of observed and missing data points provides an effective stopping criterion.
arXiv Detail & Related papers (2021-01-28T04:56:59Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z) - AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph
modularity [8.594811303203581]
We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal.
It improves on the previous state-of-the-art by typically being orders of magnitude more robust toward noise and bad data.
We develop a method for discovering generalized symmetries from gradient properties of a neural network fit.
arXiv Detail & Related papers (2020-06-18T18:01:19Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Optimal Feature Manipulation Attacks Against Linear Regression [64.54500628124511]
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
Given the energy budget, we first provide the closed-form solution of the optimal poisoning data point when our target is modifying one designated regression coefficient.
We then extend the analysis to the more challenging scenario where the attacker aims to change one particular regression coefficient while making others to be changed as small as possible.
arXiv Detail & Related papers (2020-02-29T04:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.