Related papers: Online Symbolic Regression with Informative Query

Online Symbolic Regression with Informative Query

URL: http://arxiv.org/abs/2302.10539v1
Date: Tue, 21 Feb 2023 09:13:48 GMT
Title: Online Symbolic Regression with Informative Query
Authors: Pengwei Jin, Di Huang, Rui Zhang, Xing Hu, Ziyuan Nan, Zidong Du, Qi Guo, Yunji Chen
Abstract summary: We propose QUOSR, a framework for textbfonline textbfsymbolic textbfregression. At each step, QUOSR receives historical data points, generates new $vx$, and then queries the symbolic expression to get the corresponding $y$. We show that QUOSR can facilitate modern symbolic regression methods by generating informative data.
Score: 23.684346197490605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Symbolic regression, the task of extracting mathematical expressions from the observed data $\{ \vx_i, y_i \}$, plays a crucial role in scientific discovery. Despite the promising performance of existing methods, most of them conduct symbolic regression in an \textit{offline} setting. That is, they treat the observed data points as given ones that are simply sampled from uniform distributions without exploring the expressive potential of data. However, for real-world scientific problems, the data used for symbolic regression are usually actively obtained by doing experiments, which is an \textit{online} setting. Thus, how to obtain informative data that can facilitate the symbolic regression process is an important problem that remains challenging. In this paper, we propose QUOSR, a \textbf{qu}ery-based framework for \textbf{o}nline \textbf{s}ymbolic \textbf{r}egression that can automatically obtain informative data in an iterative manner. Specifically, at each step, QUOSR receives historical data points, generates new $\vx$, and then queries the symbolic expression to get the corresponding $y$, where the $(\vx, y)$ serves as new data points. This process repeats until the maximum number of query steps is reached. To make the generated data points informative, we implement the framework with a neural network and train it by maximizing the mutual information between generated data points and the target expression. Through comprehensive experiments, we show that QUOSR can facilitate modern symbolic regression methods by generating informative data.

Related papers

Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity. We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z)
Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream. Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error. For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z)
Generalized Regression with Conditional GANs [2.4171019220503402]
We propose to learn a prediction function whose outputs, when paired with the corresponding inputs, are indistinguishable from feature-label pairs in the training dataset. We show that this approach to regression makes fewer assumptions on the distribution of the data we are fitting to and, therefore, has better representation capabilities.
arXiv Detail & Related papers (2024-04-21T01:27:47Z)
A Transformer Model for Symbolic Regression towards Scientific Discovery [11.827358526480323]
Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets. We propose a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery. We apply our best model to the SRSD datasets which yields state-of-the-art results using the normalized tree-based edit distance.
arXiv Detail & Related papers (2023-12-07T06:27:48Z)
TRIAGE: Characterizing and auditing training data for improved regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z)
Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability. The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs) Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z)
Online Active Regression [8.397196353612042]
We consider an online extension of the active regression problem: the learner receives data points one by one and decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries.
arXiv Detail & Related papers (2022-07-13T03:53:25Z)
Active Learning Improves Performance on Symbolic RegressionTasks in StackGP [2.7685408681770247]
We introduce an active learning method for symbolic regression using StackGP. We use the Feynman AI benchmark set of equations to examine the ability of our method to find appropriate models using fewer data points.
arXiv Detail & Related papers (2022-02-09T20:05:22Z)
Oblivious sketching for logistic regression [72.42202783677811]
We present the first data oblivious sketch for logistic regression. Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.
arXiv Detail & Related papers (2021-07-14T11:29:26Z)
Neural Symbolic Regression that Scales [58.45115548924735]
We introduce the first symbolic regression method that leverages large scale pre-training. We procedurally generate an unbounded set of equations, and simultaneously pre-train a Transformer to predict the symbolic equation from a corresponding set of input-output-pairs.
arXiv Detail & Related papers (2021-06-11T14:35:22Z)
Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z)
Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.