Online Active Regression
- URL: http://arxiv.org/abs/2207.05945v1
- Date: Wed, 13 Jul 2022 03:53:25 GMT
- Title: Online Active Regression
- Authors: Cheng Chen, Yi Li, Yiming Sun
- Abstract summary: We consider an online extension of the active regression problem: the learner receives data points one by one and decides whether it should collect the corresponding labels.
The goal is to efficiently maintain the regression of received data points with a small budget of label queries.
- Score: 8.397196353612042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active regression considers a linear regression problem where the learner
receives a large number of data points but can only observe a small number of
labels. Since online algorithms can deal with incremental training data and
take advantage of low computational cost, we consider an online extension of
the active regression problem: the learner receives data points one by one and
immediately decides whether it should collect the corresponding labels. The
goal is to efficiently maintain the regression of received data points with a
small budget of label queries. We propose novel algorithms for this problem
under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate
solution, our proposed algorithms only require
$\tilde{\mathcal{O}}(\epsilon^{-2} d \log(n\kappa))$ queries of labels, where
$n$ is the number of data points and $\kappa$ is a quantity, called the
condition number, of the data points. The numerical results verify our
theoretical results and show that our methods have comparable performance with
offline active regression algorithms.
Related papers
- Data subsampling for Poisson regression with pth-root-link [53.63838219437508]
We develop and analyze data subsampling techniques for Poisson regression.
In particular, we consider the Poisson generalized linear model with ID- and square root-link functions.
arXiv Detail & Related papers (2024-10-30T10:09:05Z) - Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Online Algorithms with Limited Data Retention [4.101276693704335]
We introduce a model of online algorithms subject to strict constraints on data retention.
We show it is possible to obtain an exponential improvement over a baseline algorithm that retains all data as long as possible.
One implication of our results is that data retention laws are insufficient to guarantee the right to be forgotten.
arXiv Detail & Related papers (2024-04-17T02:17:23Z) - Scaling Up Differentially Private LASSO Regularized Logistic Regression
via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively.
Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z) - Hardness and Algorithms for Robust and Sparse Optimization [17.842787715567436]
We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression.
Specifically, the sparse linear regression problem seeks a $k$-sparse vector $xinmathbbRd$ to minimize $|Ax-b|$.
The robust linear regression problem seeks a set $S$ that ignores at most $k$ rows and a vector $x$ to minimize $|(Ax-b)_S|$.
arXiv Detail & Related papers (2022-06-29T01:40:38Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - Oblivious sketching for logistic regression [72.42202783677811]
We present the first data oblivious sketch for logistic regression.
Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.
arXiv Detail & Related papers (2021-07-14T11:29:26Z) - Semi-supervised Active Regression [21.51757844385258]
This paper studies the use of partially labelled, potentially biased data for learning tasks.
The learner has access to a dataset $X in mathbbRd | X min_beta in mathbbRd | X beta - Y |2 end2 equation while making few additional label queries.
arXiv Detail & Related papers (2021-06-12T03:28:43Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.