Online Symbolic Regression with Informative Query
- URL: http://arxiv.org/abs/2302.10539v1
- Date: Tue, 21 Feb 2023 09:13:48 GMT
- Title: Online Symbolic Regression with Informative Query
- Authors: Pengwei Jin, Di Huang, Rui Zhang, Xing Hu, Ziyuan Nan, Zidong Du, Qi
Guo, Yunji Chen
- Abstract summary: We propose QUOSR, a framework for textbfonline textbfsymbolic textbfregression.
At each step, QUOSR receives historical data points, generates new $vx$, and then queries the symbolic expression to get the corresponding $y$.
We show that QUOSR can facilitate modern symbolic regression methods by generating informative data.
- Score: 23.684346197490605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Symbolic regression, the task of extracting mathematical expressions from the
observed data $\{ \vx_i, y_i \}$, plays a crucial role in scientific discovery.
Despite the promising performance of existing methods, most of them conduct
symbolic regression in an \textit{offline} setting. That is, they treat the
observed data points as given ones that are simply sampled from uniform
distributions without exploring the expressive potential of data. However, for
real-world scientific problems, the data used for symbolic regression are
usually actively obtained by doing experiments, which is an \textit{online}
setting. Thus, how to obtain informative data that can facilitate the symbolic
regression process is an important problem that remains challenging.
In this paper, we propose QUOSR, a \textbf{qu}ery-based framework for
\textbf{o}nline \textbf{s}ymbolic \textbf{r}egression that can automatically
obtain informative data in an iterative manner. Specifically, at each step,
QUOSR receives historical data points, generates new $\vx$, and then queries
the symbolic expression to get the corresponding $y$, where the $(\vx, y)$
serves as new data points. This process repeats until the maximum number of
query steps is reached. To make the generated data points informative, we
implement the framework with a neural network and train it by maximizing the
mutual information between generated data points and the target expression.
Through comprehensive experiments, we show that QUOSR can facilitate modern
symbolic regression methods by generating informative data.
Related papers
- Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Generalized Regression with Conditional GANs [2.4171019220503402]
We propose to learn a prediction function whose outputs, when paired with the corresponding inputs, are indistinguishable from feature-label pairs in the training dataset.
We show that this approach to regression makes fewer assumptions on the distribution of the data we are fitting to and, therefore, has better representation capabilities.
arXiv Detail & Related papers (2024-04-21T01:27:47Z) - A Transformer Model for Symbolic Regression towards Scientific Discovery [11.827358526480323]
Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets.
We propose a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery.
We apply our best model to the SRSD datasets which yields state-of-the-art results using the normalized tree-based edit distance.
arXiv Detail & Related papers (2023-12-07T06:27:48Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability.
The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs)
Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z) - Online Active Regression [8.397196353612042]
We consider an online extension of the active regression problem: the learner receives data points one by one and decides whether it should collect the corresponding labels.
The goal is to efficiently maintain the regression of received data points with a small budget of label queries.
arXiv Detail & Related papers (2022-07-13T03:53:25Z) - Active Learning Improves Performance on Symbolic RegressionTasks in
StackGP [2.7685408681770247]
We introduce an active learning method for symbolic regression using StackGP.
We use the Feynman AI benchmark set of equations to examine the ability of our method to find appropriate models using fewer data points.
arXiv Detail & Related papers (2022-02-09T20:05:22Z) - Oblivious sketching for logistic regression [72.42202783677811]
We present the first data oblivious sketch for logistic regression.
Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.
arXiv Detail & Related papers (2021-07-14T11:29:26Z) - Neural Symbolic Regression that Scales [58.45115548924735]
We introduce the first symbolic regression method that leverages large scale pre-training.
We procedurally generate an unbounded set of equations, and simultaneously pre-train a Transformer to predict the symbolic equation from a corresponding set of input-output-pairs.
arXiv Detail & Related papers (2021-06-11T14:35:22Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.